From a81393b995c3644b33448b9be4e697a1fad9d1d4 Mon Sep 17 00:00:00 2001 From: Nick Sweeting Date: Thu, 8 Apr 2021 08:03:26 -0400 Subject: [PATCH] Update README.md --- README.md | 63 ++++++++++++++++++++++++++++++++----------------------- 1 file changed, 37 insertions(+), 26 deletions(-) diff --git a/README.md b/README.md index 7259e6ab..02c4b2a5 100644 --- a/README.md +++ b/README.md @@ -56,7 +56,7 @@ At the end of the day, the goal is to sleep soundly knowing the part of the inte 1. Run this in a new empty folder to get started ```bash -archivebox init --setup # create a new collection in the current directory +archivebox init --setup # creates a new collection in the current directory ``` 2. Add some URLs you want to archive @@ -108,9 +108,8 @@ ls ./archive/*/index.json # or browse directly via the filesyste ### Quickstart -**🖥  Supported OSs:** Linux/BSD, macOS, Windows (w/ Docker, WSL/WSL2)     **🎮  CPU Architectures:** x86, amd64, arm7, arm8 (raspi >=3) +**🖥  Supported OSs:** Linux/BSD, macOS, Windows (w/ Docker, WSL/WSL2)     **🎮  CPU Architectures:** amd64, x86, arm8, arm7 (raspi >=3) -
#### ⬇️  Initial Setup @@ -319,11 +318,10 @@ archivebox config --set PUBLIC_ADD_VIEW=False
-
. . . . . . . . . . . . . . . . . . . . . . . . . . . .

DEMO: https://demo.archivebox.io
-Quickstart | Usage | Configuration +Usage | Configuration | Caveats
@@ -343,6 +341,17 @@ You don't need to install all the dependencies, ArchiveBox will automatically en If you so choose, you can also install ArchiveBox and its dependencies directly on any Linux or macOS systems using the [system package manager](https://github.com/ArchiveBox/ArchiveBox/wiki/Install) and the `archivebox setup` command. +```bash +# install archivebox with your system package manager +# apt/brew/pip/etc install ... (see Quickstart instructions above) + +# run the setup to auto install all the extractors and extras +archivebox setup + +# see information about all the dependencies +archivebox --version +``` + ArchiveBox is written in Python 3 so it requires `python3` and `pip3` available on your system. It also uses a set of optional, but highly recommended external dependencies for archiving sites: `wget` (for plain HTML, static files, and WARC saving), `chromium` (for screenshots, PDFs, JS execution, and more), `youtube-dl` (for audio and video), `git` (for cloning git repos), and `nodejs` (for readability, mercury, and singlefile), and more.
@@ -351,6 +360,13 @@ ArchiveBox is written in Python 3 so it requires `python3` and `pip3` available ArchiveBox supports many input formats for URLs, including Pocket & Pinboard exports, Browser bookmarks, Browser history, plain text, HTML, markdown, and more! + +*Click these links for instructions on how to propare your links from these sources:* + +- TXT, RSS, XML, JSON, CSV, SQL, HTML, Markdown, or [any other text-based format...](https://github.com/ArchiveBox/ArchiveBox/wiki/Usage#Import-a-list-of-URLs-from-a-text-file) +- [Browser history](https://github.com/ArchiveBox/ArchiveBox/wiki/Quickstart#2-get-your-list-of-urls-to-archive) or [browser bookmarks](https://github.com/ArchiveBox/ArchiveBox/wiki/Quickstart#2-get-your-list-of-urls-to-archive) (see instructions for: [Chrome](https://support.google.com/chrome/answer/96816?hl=en), [Firefox](https://support.mozilla.org/en-US/kb/export-firefox-bookmarks-to-backup-or-transfer), [Safari](http://i.imgur.com/AtcvUZA.png), [IE](https://support.microsoft.com/en-us/help/211089/how-to-import-and-export-the-internet-explorer-favorites-folder-to-a-32-bit-version-of-windows), [Opera](http://help.opera.com/Windows/12.10/en/importexport.html), [and more...](https://github.com/ArchiveBox/ArchiveBox/wiki/Quickstart#2-get-your-list-of-urls-to-archive)) +- [Pocket](https://getpocket.com/export), [Pinboard](https://pinboard.in/export/), [Instapaper](https://www.instapaper.com/user/export), [Shaarli](https://shaarli.readthedocs.io/en/master/Usage/#importexport), [Delicious](https://www.groovypost.com/howto/howto/export-delicious-bookmarks-xml/), [Reddit Saved](https://github.com/csu/export-saved-reddit), [Wallabag](https://doc.wallabag.org/en/user/import/wallabagv2.html), [Unmark.it](http://help.unmark.it/import-export), [OneTab](https://www.addictivetips.com/web/onetab-save-close-all-chrome-tabs-to-restore-export-or-import/), [and more...](https://github.com/ArchiveBox/ArchiveBox/wiki/Quickstart#2-get-your-list-of-urls-to-archive) + ```bash echo 'http://example.com' | archivebox add archivebox add 'https://example.com/some/page' @@ -366,12 +382,6 @@ echo 'https://example.com' | docker run -v $PWD:/data -i archivebox/archivebox a echo 'https://example.com' | docker-compose run -T archivebox add ``` -*Click these links for instructions on how to propare your links from these sources:* - -- TXT, RSS, XML, JSON, CSV, SQL, HTML, Markdown, or [any other text-based format...](https://github.com/ArchiveBox/ArchiveBox/wiki/Usage#Import-a-list-of-URLs-from-a-text-file) -- [Browser history](https://github.com/ArchiveBox/ArchiveBox/wiki/Quickstart#2-get-your-list-of-urls-to-archive) or [browser bookmarks](https://github.com/ArchiveBox/ArchiveBox/wiki/Quickstart#2-get-your-list-of-urls-to-archive) (see instructions for: [Chrome](https://support.google.com/chrome/answer/96816?hl=en), [Firefox](https://support.mozilla.org/en-US/kb/export-firefox-bookmarks-to-backup-or-transfer), [Safari](http://i.imgur.com/AtcvUZA.png), [IE](https://support.microsoft.com/en-us/help/211089/how-to-import-and-export-the-internet-explorer-favorites-folder-to-a-32-bit-version-of-windows), [Opera](http://help.opera.com/Windows/12.10/en/importexport.html), [and more...](https://github.com/ArchiveBox/ArchiveBox/wiki/Quickstart#2-get-your-list-of-urls-to-archive)) -- [Pocket](https://getpocket.com/export), [Pinboard](https://pinboard.in/export/), [Instapaper](https://www.instapaper.com/user/export), [Shaarli](https://shaarli.readthedocs.io/en/master/Usage/#importexport), [Delicious](https://www.groovypost.com/howto/howto/export-delicious-bookmarks-xml/), [Reddit Saved](https://github.com/csu/export-saved-reddit), [Wallabag](https://doc.wallabag.org/en/user/import/wallabagv2.html), [Unmark.it](http://help.unmark.it/import-export), [OneTab](https://www.addictivetips.com/web/onetab-save-close-all-chrome-tabs-to-restore-export-or-import/), [and more...](https://github.com/ArchiveBox/ArchiveBox/wiki/Quickstart#2-get-your-list-of-urls-to-archive) - See the [Usage: CLI](https://github.com/ArchiveBox/ArchiveBox/wiki/Usage#CLI-Usage) page for documentation and examples. It also includes a built-in scheduled import feature with `archivebox schedule` and browser bookmarklet, so you can pull in URLs from RSS feeds, websites, or the filesystem regularly/on-demand. @@ -382,21 +392,6 @@ It also includes a built-in scheduled import feature with `archivebox schedule` All of ArchiveBox's state (including the index, snapshot data, and config file) is stored in a single folder called the "ArchiveBox data folder". All `archivebox` CLI commands must be run from inside this folder, and you first create it by running `archivebox init`. -The on-disk layout is optimized to be easy to browse by hand and durable long-term. The main index is a standard sqlite3 database (it can also be exported as static JSON/HTML), and the archive snapshots are organized by date-added timestamp in the `archive/` subfolder. Each snapshot subfolder includes a static JSON and HTML index describing its contents, and the snapshot extrator outputs are plain files within the folder (e.g. `media/example.mp4`, `git/somerepo.git`, `static/someimage.png`, etc.) - -```bash -# to browse your index statically without running the archivebox server, run: -archivebox list --html --with-headers > index.html -archivebox list --json --with-headers > index.json -# if running these commands with docker-compose, add -T: -# docker-compose run -T archivebox list ... - -# then open the static index in a browser -open index.html - -# or browse the snapshots via filesystem directly -ls ./archive// -``` - **Index:** `index.html` & `index.json` HTML and JSON index files containing metadata and details - **Title**, **Favicon**, **Headers** Response headers, site favicon, and parsed site title @@ -420,6 +415,22 @@ archivebox config --set YOUTUBEDL_ARGS='--max-filesize=500m' archivebox config --help ``` +The on-disk layout is optimized to be easy to browse by hand and durable long-term. The main index is a standard sqlite3 database (it can also be exported as static JSON/HTML), and the archive snapshots are organized by date-added timestamp in the `archive/` subfolder. Each snapshot subfolder includes a static JSON and HTML index describing its contents, and the snapshot extrator outputs are plain files within the folder (e.g. `media/example.mp4`, `git/somerepo.git`, `static/someimage.png`, etc.) + +```bash +# to browse your index statically without running the archivebox server, run: +archivebox list --html --with-headers > index.html +archivebox list --json --with-headers > index.json +# if running these commands with docker-compose, add -T: +# docker-compose run -T archivebox list ... + +# then open the static index in a browser +open index.html + +# or browse the snapshots via filesystem directly +ls ./archive// +``` +
---