diff --git a/README.md b/README.md
index d67a7580..b8bc207b 100644
--- a/README.md
+++ b/README.md
@@ -41,454 +41,22 @@ All the saved content is static and indexed with json files, so it lives forever
-## Quickstart
+# Getting Started
-**1. Get your list of URLs:**
+ - [Details & Motivation](https://github.com/pirate/ArchiveBox/wiki)
+ - [Quickstart](https://github.com/pirate/ArchiveBox/wiki/Quickstart)
+ - [Install](https://github.com/pirate/ArchiveBox/wiki/Install)
-Follow the links here to find instructions for exporting a list of URLs from each service.
+# Documentation
- - [Pocket](https://getpocket.com/export)
- - [Pinboard](https://pinboard.in/export/)
- - [Instapaper](https://www.instapaper.com/user/export)
- - [Reddit Saved Posts](https://github.com/csu/export-saved-reddit)
- - [Shaarli](https://shaarli.readthedocs.io/en/master/guides/backup-restore-import-export/#export-links-as)
- - [Unmark.it](http://help.unmark.it/import-export)
- - [Wallabag](https://doc.wallabag.org/en/user/import/wallabagv2.html)
- - [Chrome Bookmarks](https://support.google.com/chrome/answer/96816?hl=en)
- - [Firefox Bookmarks](https://support.mozilla.org/en-US/kb/export-firefox-bookmarks-to-backup-or-transfer)
- - [Safari Bookmarks](http://i.imgur.com/AtcvUZA.png)
- - [Opera Bookmarks](http://help.opera.com/Windows/12.10/en/importexport.html)
- - [Internet Explorer Bookmarks](https://support.microsoft.com/en-us/help/211089/how-to-import-and-export-the-internet-explorer-favorites-folder-to-a-32-bit-version-of-windows)
- - Chrome History: `./bin/archivebox-export-browser-history --chrome`
- - Firefox History: `./bin/archivebox-export-browser-history --firefox`
- - Other File or URL: (e.g. RSS feed) pass as second argument in the next step
+ - [Configuration](https://github.com/pirate/ArchiveBox/wiki/Configuration)
+ - [Chromium Install](https://github.com/pirate/ArchiveBox/wiki/Chromium-Install)
+ - [Publishing Your Archive](https://github.com/pirate/ArchiveBox/wiki/Publishing-Your-Archive)
+ - [Troubleshooting](https://github.com/pirate/ArchiveBox/wiki/Troubleshooting)
- (If any of these links are broken, please submit an issue and I'll fix it)
+# More Info
-**2. Create your archive:**
-
-```bash
-git clone https://github.com/pirate/ArchiveBox
-cd ArchiveBox/
-./setup # install all dependencies
-
-# add a list of links from a file
-./archive ~/Downloads/bookmark_export.html # replace with the path to your export file or URL from step 1
-
-# OR add a list of links from remote URL
-./archive "https://getpocket.com/users/yourusername/feed/all" # url to an RSS, html, or json links file
-
-# OR add all the links from your browser history
-./bin/archivebox-export-browser-history --chrome # works with --firefox as well, can take path to SQLite history db
-./archive output/sources/chrome_history.json
-
-# OR just continue archiving the existing links in the index
-./archive # at any point if you just want to continue archiving where you left off, without adding any new links
-```
-
-**3. Done!**
-
-You can open `output/index.html` to view your archive. (favicons will appear next to each title once it has finished downloading)
-
-If you want to host your archive somewhere to share it with other people, see the [Publishing Your Archive](#publishing-your-archive) section below.
-
-**4. (Optional) Schedule it to run every day**
-
-You can import links from any local file path or feed url by changing the second argument to `archive.py`.
-ArchiveBox will ignore links that are imported multiple times, it will keep the earliest version that it's seen.
-This means you can add multiple cron jobs to pull links from several different feeds or files each day,
-it will keep the index up-to-date without duplicate links.
-
-This example archives a pocket RSS feed and an export file every 24 hours, and saves the output to a logfile.
-```bash
-0 24 * * * yourusername /opt/ArchiveBox/archive https://getpocket.com/users/yourusername/feed/all > /var/log/archivebox_rss.log
-0 24 * * * yourusername /opt/ArchiveBox/archive /home/darth-vader/Desktop/bookmarks.html > /var/log/archivebox_firefox.log
-```
-(Add the above lines to `/etc/crontab`)
-
-**Next Steps**
-
-If you have any trouble, see the [Troubleshooting](#troubleshooting) section at the bottom.
-If you'd like to customize options, see the [Configuration](#configuration) section.
-
-If you want something easier than running programs in the command-line, take a look at [Pocket Premium](https://getpocket.com/premium) (yay Mozilla!) and [Pinboard Pro](https://pinboard.in/upgrade/) (yay independent developer!). Both offer easy-to-use bookmark archiving with full-text-search and other features.
-
-## Details
-
-`archive.py` is a script that takes a [Pocket-format](https://getpocket.com/export), [JSON-format](https://pinboard.in/export/), [Netscape-format](https://msdn.microsoft.com/en-us/library/aa753582(v=vs.85).aspx), or RSS-formatted list of links, and downloads a clone of each linked website to turn into a browsable archive that you can store locally or host online.
-
-The archiver produces an output folder `output/` containing an `index.html`, `index.json`, and archived copies of all the sites,
-organized by timestamp bookmarked. It's Powered by [headless](https://developers.google.com/web/updates/2017/04/headless-chrome) Chromium and good 'ol `wget`.
-
-For each sites it saves:
-
- - wget of site, e.g. `en.wikipedia.org/wiki/Example.html` with .html appended if not present
- - `output.pdf` Printed PDF of site using headless chrome
- - `screenshot.png` 1440x900 screenshot of site using headless chrome
- - `output.html` DOM Dump of the HTML after rendering using headless chrome
- - `archive.org.txt` A link to the saved site on archive.org
- - `audio/` and `video/` for sites like youtube, soundcloud, etc. (using youtube-dl) (WIP)
- - `code/` clone of any repository for github, bitbucket, or gitlab links (WIP)
- - `index.json` JSON index containing link info and archive details
- - `index.html` HTML index containing link info and archive details (optional fancy or simple index)
-
-Wget doesn't work on sites you need to be logged into, but chrome headless does, see the [Configuration](#configuration)* section for `CHROME_USER_DATA_DIR`.
-
-**Large Exports & Estimated Runtime:**
-
-I've found it takes about an hour to download 1000 articles, and they'll take up roughly 1GB.
-Those numbers are from running it single-threaded on my i5 machine with 50mbps down. YMMV.
-
-You can run it in parallel by using the `resume` feature, or by manually splitting export.html into multiple files:
-```bash
-./archive export.html 1498800000 & # second argument is timestamp to resume downloading from
-./archive export.html 1498810000 &
-./archive export.html 1498820000 &
-./archive export.html 1498830000 &
-```
-Users have reported running it with 50k+ bookmarks with success (though it will take more RAM while running).
-
-If you already imported a huge list of bookmarks and want to import only new
-bookmarks, you can use the `ONLY_NEW` environment variable. This is useful if
-you want to import a bookmark dump periodically and want to skip broken links
-which are already in the index.
-
-## Configuration
-
-You can tweak parameters via environment variables, or by editing `config.py` directly:
-```bash
-env CHROME_BINARY=google-chrome-stable RESOLUTION=1440,900 FETCH_PDF=False ./archive ~/Downloads/bookmarks_export.html
-```
-
-**Shell Options:**
- - colorize console ouput: `USE_COLOR` value: [`True`]/`False`
- - show progress bar: `SHOW_PROGRESS` value: [`True`]/`False`
- - archive permissions: `OUTPUT_PERMISSIONS` values: [`755`]/`644`/`...`
-
-**Dependency Options:**
- - path to Chrome: `CHROME_BINARY` values: [`chromium-browser`]/`/usr/local/bin/google-chrome`/`...`
- - path to wget: `WGET_BINARY` values: [`wget`]/`/usr/local/bin/wget`/`...`
-
-**Archive Options:**
- - maximum allowed download time per link: `TIMEOUT` values: [`60`]/`30`/`...`
- - import only new links: `ONLY_NEW` values `True`/[`False`]
- - archive methods (values: [`True`]/`False`):
- - fetch page with wget: `FETCH_WGET`
- - fetch images/css/js with wget: `FETCH_WGET_REQUISITES` (True is highly recommended)
- - print page as PDF: `FETCH_PDF`
- - fetch a screenshot of the page: `FETCH_SCREENSHOT`
- - fetch a DOM dump of the page: `FETCH_DOM`
- - fetch a favicon for the page: `FETCH_FAVICON`
- - submit the page to archive.org: `SUBMIT_ARCHIVE_DOT_ORG`
- - screenshot: `RESOLUTION` values: [`1440,900`]/`1024,768`/`...`
- - user agent: `WGET_USER_AGENT` values: [`Wget/1.19.1`]/`"Mozilla/5.0 ..."`/`...`
- - chrome profile: `CHROME_USER_DATA_DIR` values: [`~/Library/Application\ Support/Google/Chrome/Default`]/`/tmp/chrome-profile`/`...`
- To capture sites that require a user to be logged in, you must specify a path to a chrome profile (which loads the cookies needed for the user to be logged in). If you don't have an existing chrome profile, create one with `chromium-browser --disable-gpu --user-data-dir=/tmp/chrome-profile`, and log into the sites you need. Then set `CHROME_USER_DATA_DIR=/tmp/chrome-profile` to make ArchiveBox use that profile.
- - output directory: `OUTPUT_DIR` values: [`$REPO_DIR/output`]/`/srv/www/bookmarks`/`...` Optionally output the archives to an alternative directory.
-
- (See defaults & more at the top of `config.py`)
-
-To tweak the outputted html index file's look and feel, just edit the HTML files in `archiver/templates/`.
-
-The chrome/chromium dependency is _optional_ and only required for screenshots, PDF, and DOM dump output, it can be safely ignored if those three methods are disabled.
-
-## Publishing Your Archive
-
-The archive produced by `./archive` is suitable for serving on any provider that can host static html (e.g. github pages!).
-
-You can also serve it from a home server or VPS by uploading the outputted `output` folder to your web directory, e.g. `/var/www/ArchiveBox` and configuring your webserver.
-
-Here's a sample nginx configuration that works to serve archive folders:
-
-```nginx
-location / {
- alias /path/to/ArchiveBox/output/;
- index index.html;
- autoindex on; # see directory listing upon clicking "The Files" links
- try_files $uri $uri/ =404;
-}
-```
-
-Make sure you're not running any content as CGI or PHP, you only want to serve static files!
-
-Urls look like: `https://archive.example.com/archive/1493350273/en.wikipedia.org/wiki/Dining_philosophers_problem.html`
-
-**Security WARNING & Content Disclaimer**
-
-Re-hosting other people's content has security implications for any other sites sharing your hosting domain. Make sure you understand
-the dangers of hosting unknown archived CSS & JS files [on your shared domain](https://developer.mozilla.org/en-US/docs/Web/Security/Same-origin_policy).
-Due to the security risk of serving some malicious JS you archived by accident, it's best to put this on a domain or subdomain
-of its own to keep cookies separate and slightly mitigate [CSRF attacks](https://en.wikipedia.org/wiki/Cross-site_request_forgery) and other nastiness.
-
-You may also want to blacklist your archive in `/robots.txt` if you don't want to be publicly assosciated with all the links you archive via search engine results.
-
-Be aware that some sites you archive may not allow you to rehost their content publicly for copyright reasons,
-it's up to you to host responsibly and respond to takedown requests appropriately.
-
-Please modify the `FOOTER_INFO` config variable to add your contact info to the footer of your index.
-
-## Info & Motivation
-
-This is basically an open-source version of [Pocket Premium](https://getpocket.com/premium) (which you should consider paying for!).
-I got tired of sites I saved going offline or changing their URLS, so I started
-archiving a copy of them locally now, similar to The Way-Back Machine provided
-by [archive.org](https://archive.org). Self hosting your own archive allows you to save
-PDFs & Screenshots of dynamic sites in addition to static html, something archive.org doesn't do.
-
-Now I can rest soundly knowing important articles and resources I like wont dissapear off the internet.
-
-My published archive as an example: [archive.sweeting.me](https://archive.sweeting.me).
-
-## Manual Setup
-
-If you don't like running random setup scripts off the internet (:+1:), you can follow these manual setup instructions.
-
-**1. Install dependencies:** `chromium >= 59`,` wget >= 1.16`, `python3 >= 3.5` (`google-chrome >= v59` works fine as well)
-
-If you already have Google Chrome installed, or wish to use that instead of Chromium, follow the [Google Chrome Instructions](#google-chrome-instructions).
-
-```bash
-# On Mac:
-brew cask install chromium # If you already have Google Chrome/Chromium in /Applications/, skip this command
-brew install wget python3
-
-echo -e '#!/bin/bash\n/Applications/Chromium.app/Contents/MacOS/Chromium "$@"' > /usr/local/bin/chromium-browser # see instructions for google-chrome below
-chmod +x /usr/local/bin/chromium-browser
-```
-
-```bash
-# On Ubuntu/Debian:
-apt install chromium-browser python3 wget
-```
-
-```bash
-# Check that everything worked:
-chromium-browser --version && which wget && which python3 && which curl && echo "[√] All dependencies installed."
-```
-
-**2. Get your bookmark export file:**
-
-Follow the instruction links above in the "Quickstart" section to download your bookmarks export file.
-
-**3. Run the archive script:**
-
-1. Clone this repo `git clone https://github.com/pirate/ArchiveBox`
-3. `cd ArchiveBox/`
-4. `./archive ~/Downloads/bookmarks_export.html`
-
-You may optionally specify a second argument to `archive.py export.html 153242424324` to resume the archive update at a specific timestamp.
-
-If you have any trouble, see the [Troubleshooting](#troubleshooting) section at the bottom.
-
-### Google Chrome Instructions:
-
-I recommend Chromium instead of Google Chrome, since it's open source and doesn't send your data to Google.
-Chromium may have some issues rendering some sites though, so you're welcome to try Google-chrome instead.
-It's also easier to use Google Chrome if you already have it installed, rather than downloading Chromium all over.
-
-1. Install & link google-chrome
-```bash
-# On Mac:
-# If you already have Google Chrome in /Applications/, skip this brew command
-brew cask install google-chrome
-brew install wget python3
-
-echo -e '#!/bin/bash\n/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome "$@"' > /usr/local/bin/google-chrome
-chmod +x /usr/local/bin/google-chrome
-```
-
-```bash
-# On Linux:
-wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | sudo apt-key add -
-sudo sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list'
-apt update; apt install google-chrome-beta python3 wget
-```
-
-2. Set the environment variable `CHROME_BINARY` to `google-chrome` before running:
-
-```bash
-env CHROME_BINARY=google-chrome ./archive ~/Downloads/bookmarks_export.html
-```
-If you're having any trouble trying to set up Google Chrome or Chromium, see the Troubleshooting section below.
-
-## Troubleshooting
-
-### Dependencies
-
-**Python:**
-
-On some Linux distributions the python3 package might not be recent enough.
-If this is the case for you, resort to installing a recent enough version manually.
-```bash
-add-apt-repository ppa:fkrull/deadsnakes && apt update && apt install python3.6
-```
-If you still need help, [the official Python docs](https://docs.python.org/3.6/using/unix.html) are a good place to start.
-
-**Chromium/Google Chrome:**
-
-`archive.py` depends on being able to access a `chromium-browser`/`google-chrome` executable. The executable used
-defaults to `chromium-browser` but can be manually specified with the environment variable `CHROME_BINARY`:
-
-```bash
-env CHROME_BINARY=/usr/local/bin/chromium-browser ./archive ~/Downloads/bookmarks_export.html
-```
-
-1. Test to make sure you have Chrome on your `$PATH` with:
-
-```bash
-which chromium-browser || which google-chrome
-```
-If no executable is displayed, follow the setup instructions to install and link one of them.
-
-2. If a path is displayed, the next step is to check that it's runnable:
-
-```bash
-chromium-browser --version || google-chrome --version
-```
-If no version is displayed, try the setup instructions again, or confirm that you have permission to access chrome.
-
-3. If a version is displayed and it's `<59`, upgrade it:
-
-```bash
-apt upgrade chromium-browser -y
-# OR
-brew cask upgrade chromium-browser
-```
-
-4. If a version is displayed and it's `>=59`, make sure `archive.py` is running the right one:
-
-```bash
-env CHROME_BINARY=/path/from/step/1/chromium-browser ./archive bookmarks_export.html # replace the path with the one you got from step 1
-```
-
-
-**Wget & Curl:**
-
-If you're missing `wget` or `curl`, simply install them using `apt` or your package manager of choice.
-See the "Manual Setup" instructions for more details.
-
-If wget times out or randomly fails to download some sites that you have confirmed are online,
-upgrade wget to the most recent version with `brew upgrade wget` or `apt upgrade wget`. There is
-a bug in versions `<=1.19.1_1` that caused wget to fail for perfectly valid sites.
-
-### Archiving
-
-**No links parsed from export file:**
-
-Please open an [issue](https://github.com/pirate/ArchiveBox/issues) with a description of where you got the export, and
-preferrably your export file attached (you can redact the links). We'll fix the parser to support your format.
-
-**Lots of skipped sites:**
-
-If you ran the archiver once, it wont re-download sites subsequent times, it will only download new links.
-If you haven't already run it, make sure you have a working internet connection and that the parsed URLs look correct.
-You can check the `archive.py` output or `index.html` to see what links it's downloading.
-
-If you're still having issues, try deleting or moving the `output/archive` folder (back it up first!) and running `./archive` again.
-
-**Lots of errors:**
-
-Make sure you have all the dependencies installed and that you're able to visit the links from your browser normally.
-Open an [issue](https://github.com/pirate/ArchiveBox/issues) with a description of the errors if you're still having problems.
-
-**Lots of broken links from the index:**
-
-Not all sites can be effectively archived with each method, that's why it's best to use a combination of `wget`, PDFs, and screenshots.
-If it seems like more than 10-20% of sites in the archive are broken, open an [issue](https://github.com/pirate/ArchiveBox/issues)
-with some of the URLs that failed to be archived and I'll investigate.
-
-**Removing unwanted links from the index:**
-
-If you accidentally added lots of unwanted links into index and they slow down your archiving, you can use the `bin/purge` script to remove them from your index, which removes everything matching python regexes you pass into it. E.g: `bin/purge -r 'amazon\.com' -r 'google\.com'`. It would prompt before removing links from index, but for extra safety you might want to back up `index.json` first (or put in undex version control).
-
-### Hosting the Archive
-
-If you're having issues trying to host the archive via nginx, make sure you already have nginx running with SSL.
-If you don't, google around, there are plenty of tutorials to help get that set up. Open an [issue](https://github.com/pirate/ArchiveBox/issues)
-if you have problem with a particular nginx config.
-
-
-## Links
-
-**Similar Projects:**
- - [Reminiscence](https://github.com/kanishka-linux/reminiscence/) extremely similar to BA, uses a Django backend + UI and provides auto tagging and summary features with NLTK
- - [Memex by Worldbrain.io](https://github.com/WorldBrain/Memex) a browser extension that saves all your history and does full-text search
- - [Hypothes.is](https://web.hypothes.is/) a web/pdf/ebook annotation tool that also archives content
- - [Perkeep](https://perkeep.org/) "Perkeep lets you permanently keep your stuff, for life."
- - [Fetching.io](http://fetching.io/) A personal search engine/archiver that lets you search through all archived websites that you've bookmarked
- - [Shaarchiver](https://github.com/nodiscc/shaarchiver) very similar project that archives Firefox, Shaarli, or Delicious bookmarks and all linked media, generating a markdown/HTML index
- - [Webrecorder.io](https://webrecorder.io/) Save full browsing sessions and archive all the content
- - [Wallabag](https://wallabag.org) Save articles you read locally or on your phone
- - [Archivematica](https://github.com/artefactual/archivematica) web GUI for institutional long-term archiving of web and other content
-
-**Discussions:**
- - [Hacker News Discussion](https://news.ycombinator.com/item?id=14272133)
- - [Reddit r/selfhosted Discussion](https://www.reddit.com/r/selfhosted/comments/69eoi3/pocket_stream_archive_your_own_personal_wayback/)
- - [Reddit r/datahoarder Discussion #1](https://www.reddit.com/r/DataHoarder/comments/69e6i9/archive_a_browseable_copy_of_your_saved_pocket/)
- - [Reddit r/datahoarder Discussion #2](https://www.reddit.com/r/DataHoarder/comments/6kepv6/bookmarkarchiver_now_supports_archiving_all_major/)
-
-
-**Tools/Other:**
- - https://github.com/ikreymer/webarchiveplayer#auto-load-warcs
- - [Sheetsee-Pocket](http://jlord.us/sheetsee-pocket/) project that provides a pretty auto-updating index of your Pocket links (without archiving them)
- - [Pocket -> IFTTT -> Dropbox](https://christopher.su/2013/saving-pocket-links-file-day-dropbox-ifttt-launchd/) Post by Christopher Su on his Pocket saving IFTTT recipie
-
-
-## Roadmap
-
-[*Official Roadmap*](https://github.com/pirate/ArchiveBox/issues/120).
-
-If you feel like contributing a PR, some of these tasks are pretty easy. Feel free to open an issue if you need help getting started in any way!
-
-**Major upcoming changes:**
-
- - finalize python packaging to allow installing via pip and importing individual componenets
- - add an optional web GUI for managing sources, adding new links, and viewing the archive
-
-**Minor upcoming changes:**
- - download closed-captions text from youtube videos
- - body text extraction using [fathom](https://hacks.mozilla.org/2017/04/fathom-a-framework-for-understanding-web-pages/)
- - auto-tagging based on important extracted words
- - audio & video archiving with `youtube-dl`
- - full-text indexing with elasticsearch/elasticlunr/ag
- - video closed-caption downloading on Youtube for full-text indexing of video content
- - automatic text summaries of article with nlp summarization library
- - featured image extraction
- - http support (from my https-only domain)
- - try wgetting dead sites from archive.org (https://github.com/hartator/wayback-machine-downloader)
-
-
-## Changelog
-
- - v0.2.0 released with new name
- - [renamed](https://github.com/pirate/ArchiveBox/issues/108) from **Bookmark Archiver** -> **ArchiveBox**
- - v0.1.0 released
- - support for browser history exporting added with `./bin/archivebox-export-browser-history`
- - support for chrome `--dump-dom` to output full page HTML after JS executes
- - v0.0.3 released
- - support for chrome `--user-data-dir` to archive sites that need logins
- - fancy individual html & json indexes for each link
- - smartly append new links to existing index instead of overwriting
- - v0.0.2 released
- - proper HTML templating instead of format strings (thanks to https://github.com/bardisty!)
- - refactored into separate files, wip audio & video archiving
- - v0.0.1 released
- - Index links now work without nginx url rewrites, archive can now be hosted on github pages
- - added setup.sh script & docstrings & help commands
- - made Chromium the default instead of Google Chrome (yay free software)
- - added [env-variable](https://github.com/pirate/ArchiveBox/pull/25) configuration (thanks to https://github.com/hannah98!)
- - renamed from **Pocket Archive Stream** -> **Bookmark Archiver**
- - added [Netscape-format](https://github.com/pirate/ArchiveBox/pull/20) export support (thanks to https://github.com/ilvar!)
- - added [Pinboard-format](https://github.com/pirate/ArchiveBox/pull/7) export support (thanks to https://github.com/sconeyard!)
- - front-page of HN, oops! apparently I have users to support now :grin:?
- - added Pocket-format export support
- - v0.0.0 released: created Pocket Archive Stream 2017/05/05
-
-
-## Donations
-
-https://www.patreon.com/theSquashSH
-
-If you want to help sponsor this project long-term or just say thanks or suggest changes, contact me at bookmark-archiver@sweeting.me.
-
-[Other Grants / Donations Info](https://github.com/pirate/ArchiveBox/blob/master/DONATE.md)
+ - [Roadmap](https://github.com/pirate/ArchiveBox/wiki/Roadmap)
+ - [Changelog](https://github.com/pirate/ArchiveBox/wiki/Changelog)
+ - [Donations](https://github.com/pirate/ArchiveBox/wiki/Donations)
+ - [Web Archiving Community](https://github.com/pirate/ArchiveBox/wiki/Web-Archiving-Community)