mirror of
https://github.com/ArchiveBox/ArchiveBox
synced 2024-11-10 06:34:16 +00:00
Update README.md
This commit is contained in:
parent
1224cd197e
commit
a81393b995
1 changed files with 37 additions and 26 deletions
63
README.md
63
README.md
|
@ -56,7 +56,7 @@ At the end of the day, the goal is to sleep soundly knowing the part of the inte
|
||||||
|
|
||||||
1. Run this in a new empty folder to get started
|
1. Run this in a new empty folder to get started
|
||||||
```bash
|
```bash
|
||||||
archivebox init --setup # create a new collection in the current directory
|
archivebox init --setup # creates a new collection in the current directory
|
||||||
```
|
```
|
||||||
|
|
||||||
2. Add some URLs you want to archive
|
2. Add some URLs you want to archive
|
||||||
|
@ -108,9 +108,8 @@ ls ./archive/*/index.json # or browse directly via the filesyste
|
||||||
|
|
||||||
### Quickstart
|
### Quickstart
|
||||||
|
|
||||||
**🖥 Supported OSs:** Linux/BSD, macOS, Windows (w/ Docker, WSL/WSL2) **🎮 CPU Architectures:** x86, amd64, arm7, arm8 (raspi >=3)
|
**🖥 Supported OSs:** Linux/BSD, macOS, Windows (w/ Docker, WSL/WSL2) **🎮 CPU Architectures:** amd64, x86, arm8, arm7 (raspi >=3)
|
||||||
|
|
||||||
<br/>
|
|
||||||
|
|
||||||
#### ⬇️ Initial Setup
|
#### ⬇️ Initial Setup
|
||||||
|
|
||||||
|
@ -319,11 +318,10 @@ archivebox config --set PUBLIC_ADD_VIEW=False
|
||||||
<br/>
|
<br/>
|
||||||
|
|
||||||
<div align="center">
|
<div align="center">
|
||||||
<br/>
|
|
||||||
<sub>. . . . . . . . . . . . . . . . . . . . . . . . . . . .</sub>
|
<sub>. . . . . . . . . . . . . . . . . . . . . . . . . . . .</sub>
|
||||||
<br/><br/>
|
<br/><br/>
|
||||||
<a href="https://demo.archivebox.io">DEMO: <code>https://demo.archivebox.io</code></a><br/>
|
<a href="https://demo.archivebox.io">DEMO: <code>https://demo.archivebox.io</code></a><br/>
|
||||||
<a href="https://github.com/ArchiveBox/ArchiveBox/wiki/Quickstart">Quickstart</a> | <a href="https://github.com/ArchiveBox/ArchiveBox/wiki/Usage">Usage</a> | <a href="https://github.com/ArchiveBox/ArchiveBox/wiki/Configuration">Configuration</a>
|
<a href="https://github.com/ArchiveBox/ArchiveBox/wiki/Usage">Usage</a> | <a href="https://github.com/ArchiveBox/ArchiveBox/wiki/Configuration">Configuration</a> | <a href="#Caveats">Caveats</a>
|
||||||
<br/>
|
<br/>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
@ -343,6 +341,17 @@ You don't need to install all the dependencies, ArchiveBox will automatically en
|
||||||
|
|
||||||
If you so choose, you can also install ArchiveBox and its dependencies directly on any Linux or macOS systems using the [system package manager](https://github.com/ArchiveBox/ArchiveBox/wiki/Install) and the `archivebox setup` command.
|
If you so choose, you can also install ArchiveBox and its dependencies directly on any Linux or macOS systems using the [system package manager](https://github.com/ArchiveBox/ArchiveBox/wiki/Install) and the `archivebox setup` command.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# install archivebox with your system package manager
|
||||||
|
# apt/brew/pip/etc install ... (see Quickstart instructions above)
|
||||||
|
|
||||||
|
# run the setup to auto install all the extractors and extras
|
||||||
|
archivebox setup
|
||||||
|
|
||||||
|
# see information about all the dependencies
|
||||||
|
archivebox --version
|
||||||
|
```
|
||||||
|
|
||||||
ArchiveBox is written in Python 3 so it requires `python3` and `pip3` available on your system. It also uses a set of optional, but highly recommended external dependencies for archiving sites: `wget` (for plain HTML, static files, and WARC saving), `chromium` (for screenshots, PDFs, JS execution, and more), `youtube-dl` (for audio and video), `git` (for cloning git repos), and `nodejs` (for readability, mercury, and singlefile), and more.
|
ArchiveBox is written in Python 3 so it requires `python3` and `pip3` available on your system. It also uses a set of optional, but highly recommended external dependencies for archiving sites: `wget` (for plain HTML, static files, and WARC saving), `chromium` (for screenshots, PDFs, JS execution, and more), `youtube-dl` (for audio and video), `git` (for cloning git repos), and `nodejs` (for readability, mercury, and singlefile), and more.
|
||||||
|
|
||||||
<br/>
|
<br/>
|
||||||
|
@ -351,6 +360,13 @@ ArchiveBox is written in Python 3 so it requires `python3` and `pip3` available
|
||||||
|
|
||||||
ArchiveBox supports many input formats for URLs, including Pocket & Pinboard exports, Browser bookmarks, Browser history, plain text, HTML, markdown, and more!
|
ArchiveBox supports many input formats for URLs, including Pocket & Pinboard exports, Browser bookmarks, Browser history, plain text, HTML, markdown, and more!
|
||||||
|
|
||||||
|
|
||||||
|
*Click these links for instructions on how to propare your links from these sources:*
|
||||||
|
|
||||||
|
- <img src="https://nicksweeting.com/images/rss.svg" height="22px"/> TXT, RSS, XML, JSON, CSV, SQL, HTML, Markdown, or [any other text-based format...](https://github.com/ArchiveBox/ArchiveBox/wiki/Usage#Import-a-list-of-URLs-from-a-text-file)
|
||||||
|
- <img src="https://nicksweeting.com/images/bookmarks.png" height="22px"/> [Browser history](https://github.com/ArchiveBox/ArchiveBox/wiki/Quickstart#2-get-your-list-of-urls-to-archive) or [browser bookmarks](https://github.com/ArchiveBox/ArchiveBox/wiki/Quickstart#2-get-your-list-of-urls-to-archive) (see instructions for: [Chrome](https://support.google.com/chrome/answer/96816?hl=en), [Firefox](https://support.mozilla.org/en-US/kb/export-firefox-bookmarks-to-backup-or-transfer), [Safari](http://i.imgur.com/AtcvUZA.png), [IE](https://support.microsoft.com/en-us/help/211089/how-to-import-and-export-the-internet-explorer-favorites-folder-to-a-32-bit-version-of-windows), [Opera](http://help.opera.com/Windows/12.10/en/importexport.html), [and more...](https://github.com/ArchiveBox/ArchiveBox/wiki/Quickstart#2-get-your-list-of-urls-to-archive))
|
||||||
|
- <img src="https://getpocket.com/favicon.ico" height="22px"/> [Pocket](https://getpocket.com/export), [Pinboard](https://pinboard.in/export/), [Instapaper](https://www.instapaper.com/user/export), [Shaarli](https://shaarli.readthedocs.io/en/master/Usage/#importexport), [Delicious](https://www.groovypost.com/howto/howto/export-delicious-bookmarks-xml/), [Reddit Saved](https://github.com/csu/export-saved-reddit), [Wallabag](https://doc.wallabag.org/en/user/import/wallabagv2.html), [Unmark.it](http://help.unmark.it/import-export), [OneTab](https://www.addictivetips.com/web/onetab-save-close-all-chrome-tabs-to-restore-export-or-import/), [and more...](https://github.com/ArchiveBox/ArchiveBox/wiki/Quickstart#2-get-your-list-of-urls-to-archive)
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
echo 'http://example.com' | archivebox add
|
echo 'http://example.com' | archivebox add
|
||||||
archivebox add 'https://example.com/some/page'
|
archivebox add 'https://example.com/some/page'
|
||||||
|
@ -366,12 +382,6 @@ echo 'https://example.com' | docker run -v $PWD:/data -i archivebox/archivebox a
|
||||||
echo 'https://example.com' | docker-compose run -T archivebox add
|
echo 'https://example.com' | docker-compose run -T archivebox add
|
||||||
```
|
```
|
||||||
|
|
||||||
*Click these links for instructions on how to propare your links from these sources:*
|
|
||||||
|
|
||||||
- <img src="https://nicksweeting.com/images/rss.svg" height="22px"/> TXT, RSS, XML, JSON, CSV, SQL, HTML, Markdown, or [any other text-based format...](https://github.com/ArchiveBox/ArchiveBox/wiki/Usage#Import-a-list-of-URLs-from-a-text-file)
|
|
||||||
- <img src="https://nicksweeting.com/images/bookmarks.png" height="22px"/> [Browser history](https://github.com/ArchiveBox/ArchiveBox/wiki/Quickstart#2-get-your-list-of-urls-to-archive) or [browser bookmarks](https://github.com/ArchiveBox/ArchiveBox/wiki/Quickstart#2-get-your-list-of-urls-to-archive) (see instructions for: [Chrome](https://support.google.com/chrome/answer/96816?hl=en), [Firefox](https://support.mozilla.org/en-US/kb/export-firefox-bookmarks-to-backup-or-transfer), [Safari](http://i.imgur.com/AtcvUZA.png), [IE](https://support.microsoft.com/en-us/help/211089/how-to-import-and-export-the-internet-explorer-favorites-folder-to-a-32-bit-version-of-windows), [Opera](http://help.opera.com/Windows/12.10/en/importexport.html), [and more...](https://github.com/ArchiveBox/ArchiveBox/wiki/Quickstart#2-get-your-list-of-urls-to-archive))
|
|
||||||
- <img src="https://getpocket.com/favicon.ico" height="22px"/> [Pocket](https://getpocket.com/export), [Pinboard](https://pinboard.in/export/), [Instapaper](https://www.instapaper.com/user/export), [Shaarli](https://shaarli.readthedocs.io/en/master/Usage/#importexport), [Delicious](https://www.groovypost.com/howto/howto/export-delicious-bookmarks-xml/), [Reddit Saved](https://github.com/csu/export-saved-reddit), [Wallabag](https://doc.wallabag.org/en/user/import/wallabagv2.html), [Unmark.it](http://help.unmark.it/import-export), [OneTab](https://www.addictivetips.com/web/onetab-save-close-all-chrome-tabs-to-restore-export-or-import/), [and more...](https://github.com/ArchiveBox/ArchiveBox/wiki/Quickstart#2-get-your-list-of-urls-to-archive)
|
|
||||||
|
|
||||||
See the [Usage: CLI](https://github.com/ArchiveBox/ArchiveBox/wiki/Usage#CLI-Usage) page for documentation and examples.
|
See the [Usage: CLI](https://github.com/ArchiveBox/ArchiveBox/wiki/Usage#CLI-Usage) page for documentation and examples.
|
||||||
|
|
||||||
It also includes a built-in scheduled import feature with `archivebox schedule` and browser bookmarklet, so you can pull in URLs from RSS feeds, websites, or the filesystem regularly/on-demand.
|
It also includes a built-in scheduled import feature with `archivebox schedule` and browser bookmarklet, so you can pull in URLs from RSS feeds, websites, or the filesystem regularly/on-demand.
|
||||||
|
@ -382,21 +392,6 @@ It also includes a built-in scheduled import feature with `archivebox schedule`
|
||||||
|
|
||||||
All of ArchiveBox's state (including the index, snapshot data, and config file) is stored in a single folder called the "ArchiveBox data folder". All `archivebox` CLI commands must be run from inside this folder, and you first create it by running `archivebox init`.
|
All of ArchiveBox's state (including the index, snapshot data, and config file) is stored in a single folder called the "ArchiveBox data folder". All `archivebox` CLI commands must be run from inside this folder, and you first create it by running `archivebox init`.
|
||||||
|
|
||||||
The on-disk layout is optimized to be easy to browse by hand and durable long-term. The main index is a standard sqlite3 database (it can also be exported as static JSON/HTML), and the archive snapshots are organized by date-added timestamp in the `archive/` subfolder. Each snapshot subfolder includes a static JSON and HTML index describing its contents, and the snapshot extrator outputs are plain files within the folder (e.g. `media/example.mp4`, `git/somerepo.git`, `static/someimage.png`, etc.)
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# to browse your index statically without running the archivebox server, run:
|
|
||||||
archivebox list --html --with-headers > index.html
|
|
||||||
archivebox list --json --with-headers > index.json
|
|
||||||
# if running these commands with docker-compose, add -T:
|
|
||||||
# docker-compose run -T archivebox list ...
|
|
||||||
|
|
||||||
# then open the static index in a browser
|
|
||||||
open index.html
|
|
||||||
|
|
||||||
# or browse the snapshots via filesystem directly
|
|
||||||
ls ./archive/<timestamp>/
|
|
||||||
```
|
|
||||||
|
|
||||||
- **Index:** `index.html` & `index.json` HTML and JSON index files containing metadata and details
|
- **Index:** `index.html` & `index.json` HTML and JSON index files containing metadata and details
|
||||||
- **Title**, **Favicon**, **Headers** Response headers, site favicon, and parsed site title
|
- **Title**, **Favicon**, **Headers** Response headers, site favicon, and parsed site title
|
||||||
|
@ -420,6 +415,22 @@ archivebox config --set YOUTUBEDL_ARGS='--max-filesize=500m'
|
||||||
archivebox config --help
|
archivebox config --help
|
||||||
```
|
```
|
||||||
|
|
||||||
|
The on-disk layout is optimized to be easy to browse by hand and durable long-term. The main index is a standard sqlite3 database (it can also be exported as static JSON/HTML), and the archive snapshots are organized by date-added timestamp in the `archive/` subfolder. Each snapshot subfolder includes a static JSON and HTML index describing its contents, and the snapshot extrator outputs are plain files within the folder (e.g. `media/example.mp4`, `git/somerepo.git`, `static/someimage.png`, etc.)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# to browse your index statically without running the archivebox server, run:
|
||||||
|
archivebox list --html --with-headers > index.html
|
||||||
|
archivebox list --json --with-headers > index.json
|
||||||
|
# if running these commands with docker-compose, add -T:
|
||||||
|
# docker-compose run -T archivebox list ...
|
||||||
|
|
||||||
|
# then open the static index in a browser
|
||||||
|
open index.html
|
||||||
|
|
||||||
|
# or browse the snapshots via filesystem directly
|
||||||
|
ls ./archive/<timestamp>/
|
||||||
|
```
|
||||||
|
|
||||||
<br/>
|
<br/>
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
Loading…
Reference in a new issue