# ArchiveBox UI ## Page: Getting Started ### What do you want to capture? - Save some URLs now -> [Add page] - Paste some URLs to archive now - Upload a file containing URLs (bookmarks.html export, RSS.xml feed, markdown file, word doc, PDF, etc.) - Pull in URLs to archive from a remote location (e.g. RSS feed URL, remote TXT file, JSON file, etc.) - Import URLs from a browser -> [Import page] - Desktop: Get the ArchiveBox Chrome/Firefox extension - Mobile: Get the ArchiveBox iOS App / Android App - Upload a bookmarks.html export file - Upload a browser_history.sqlite3 export file - Import URLs from a 3rd party bookmarking service -> [Sync page] - Pocket - Pinboard - Instapaper - Wallabag - Zapier, N8N, IFTTT, etc. - Upload a bookmarks.html export, bookmarks.json, RSS, etc. file - Archive URLs on a schedule -> [Schedule page] - Archive an entire website -> [Crawl page] - What starting URL/domain? - How deep? - Follow links to external domains? - Follow links to parent URLs? - Maximum number of pages to save? - Maximum number of requests/minute? - Crawl for URLs with a search engine and save automatically - - Some URLs on a schedule - Save an entire website (e.g. `https://example.com`) - Save results matching a search query (e.g. "site:example.com") - Save a social media feed (e.g. `https://x.com/user/1234567890`) -------------------------------------------------------------------------------- ### Crawls App - Archive an entire website -> [Crawl page] - What are the seed URLs? - How many hops to follow? - Follow links to external domains? - Follow links to parent URLs? - Maximum number of pages to save? - Maximum number of requests/minute? -------------------------------------------------------------------------------- ### Scheduler App - Archive URLs on a schedule -> [Schedule page] - What URL(s)? - How often? - Do you want to discard old snapshots after x amount of time? - Any filter rules? - Want to be notified when changes are detected -> redirect[Alerts app/create new alert(crawl=self)] * Choose Schedule check for new URLs: Schedule.objects.get(pk=xyz) - 1 minute - 5 minutes - 1 hour - 1 day * Choose Destination Crawl to archive URLs using : Crawl.objects.get(pk=xyz) - Tags - Persona - Created By ID - Config - Filters - URL patterns to include - URL patterns to exclude - ONLY_NEW= Ignore URLs if already saved once / save URL each time it appears / only save is last save > x time ago -------------------------------------------------------------------------------- ### Sources App (For managing sources that ArchiveBox pulls URLs in from) - Add a new source to pull URLs in from (WIZARD) - Choose URI: - [x] Web UI - [x] CLI - Local filesystem path (directory to monitor for new files containing URLs) - Remote URL (RSS/JSON/XML feed) - Chrome browser profile sync (login using gmail to pull bookmarks/history) - Pocket, Pinboard, Instapaper, Wallabag, etc. - Zapier, N8N, IFTTT, etc. - Local server filesystem path (directory to monitor for new files containing URLs) - Google drive (directory to monitor for new files containing URLs) - Remote server FTP/SFTP/SCP path (directory to monitor for new files containing URLs) - AWS/S3/B2/GCP bucket (directory to monitor for new files containing URLs) - XBrowserSync (login to pull bookmarks) - Choose extractor - auto - RSS - Pocket - etc. - Specify extra Config, e.g. - credentials - extractor tuning options (e.g. verify_ssl, cookies, etc.) - Provide credentials for the source - API Key - Username / Password - OAuth -------------------------------------------------------------------------------- ### Alerts App - Create a new alert, choose condition - Get notified when a site goes down (