mirror of
https://github.com/ArchiveBox/ArchiveBox
synced 2024-11-22 12:13:05 +00:00
Update README.md
This commit is contained in:
parent
1224cd197e
commit
a81393b995
1 changed files with 37 additions and 26 deletions
63
README.md
63
README.md
|
@ -56,7 +56,7 @@ At the end of the day, the goal is to sleep soundly knowing the part of the inte
|
|||
|
||||
1. Run this in a new empty folder to get started
|
||||
```bash
|
||||
archivebox init --setup # create a new collection in the current directory
|
||||
archivebox init --setup # creates a new collection in the current directory
|
||||
```
|
||||
|
||||
2. Add some URLs you want to archive
|
||||
|
@ -108,9 +108,8 @@ ls ./archive/*/index.json # or browse directly via the filesyste
|
|||
|
||||
### Quickstart
|
||||
|
||||
**🖥 Supported OSs:** Linux/BSD, macOS, Windows (w/ Docker, WSL/WSL2) **🎮 CPU Architectures:** x86, amd64, arm7, arm8 (raspi >=3)
|
||||
**🖥 Supported OSs:** Linux/BSD, macOS, Windows (w/ Docker, WSL/WSL2) **🎮 CPU Architectures:** amd64, x86, arm8, arm7 (raspi >=3)
|
||||
|
||||
<br/>
|
||||
|
||||
#### ⬇️ Initial Setup
|
||||
|
||||
|
@ -319,11 +318,10 @@ archivebox config --set PUBLIC_ADD_VIEW=False
|
|||
<br/>
|
||||
|
||||
<div align="center">
|
||||
<br/>
|
||||
<sub>. . . . . . . . . . . . . . . . . . . . . . . . . . . .</sub>
|
||||
<br/><br/>
|
||||
<a href="https://demo.archivebox.io">DEMO: <code>https://demo.archivebox.io</code></a><br/>
|
||||
<a href="https://github.com/ArchiveBox/ArchiveBox/wiki/Quickstart">Quickstart</a> | <a href="https://github.com/ArchiveBox/ArchiveBox/wiki/Usage">Usage</a> | <a href="https://github.com/ArchiveBox/ArchiveBox/wiki/Configuration">Configuration</a>
|
||||
<a href="https://github.com/ArchiveBox/ArchiveBox/wiki/Usage">Usage</a> | <a href="https://github.com/ArchiveBox/ArchiveBox/wiki/Configuration">Configuration</a> | <a href="#Caveats">Caveats</a>
|
||||
<br/>
|
||||
</div>
|
||||
|
||||
|
@ -343,6 +341,17 @@ You don't need to install all the dependencies, ArchiveBox will automatically en
|
|||
|
||||
If you so choose, you can also install ArchiveBox and its dependencies directly on any Linux or macOS systems using the [system package manager](https://github.com/ArchiveBox/ArchiveBox/wiki/Install) and the `archivebox setup` command.
|
||||
|
||||
```bash
|
||||
# install archivebox with your system package manager
|
||||
# apt/brew/pip/etc install ... (see Quickstart instructions above)
|
||||
|
||||
# run the setup to auto install all the extractors and extras
|
||||
archivebox setup
|
||||
|
||||
# see information about all the dependencies
|
||||
archivebox --version
|
||||
```
|
||||
|
||||
ArchiveBox is written in Python 3 so it requires `python3` and `pip3` available on your system. It also uses a set of optional, but highly recommended external dependencies for archiving sites: `wget` (for plain HTML, static files, and WARC saving), `chromium` (for screenshots, PDFs, JS execution, and more), `youtube-dl` (for audio and video), `git` (for cloning git repos), and `nodejs` (for readability, mercury, and singlefile), and more.
|
||||
|
||||
<br/>
|
||||
|
@ -351,6 +360,13 @@ ArchiveBox is written in Python 3 so it requires `python3` and `pip3` available
|
|||
|
||||
ArchiveBox supports many input formats for URLs, including Pocket & Pinboard exports, Browser bookmarks, Browser history, plain text, HTML, markdown, and more!
|
||||
|
||||
|
||||
*Click these links for instructions on how to propare your links from these sources:*
|
||||
|
||||
- <img src="https://nicksweeting.com/images/rss.svg" height="22px"/> TXT, RSS, XML, JSON, CSV, SQL, HTML, Markdown, or [any other text-based format...](https://github.com/ArchiveBox/ArchiveBox/wiki/Usage#Import-a-list-of-URLs-from-a-text-file)
|
||||
- <img src="https://nicksweeting.com/images/bookmarks.png" height="22px"/> [Browser history](https://github.com/ArchiveBox/ArchiveBox/wiki/Quickstart#2-get-your-list-of-urls-to-archive) or [browser bookmarks](https://github.com/ArchiveBox/ArchiveBox/wiki/Quickstart#2-get-your-list-of-urls-to-archive) (see instructions for: [Chrome](https://support.google.com/chrome/answer/96816?hl=en), [Firefox](https://support.mozilla.org/en-US/kb/export-firefox-bookmarks-to-backup-or-transfer), [Safari](http://i.imgur.com/AtcvUZA.png), [IE](https://support.microsoft.com/en-us/help/211089/how-to-import-and-export-the-internet-explorer-favorites-folder-to-a-32-bit-version-of-windows), [Opera](http://help.opera.com/Windows/12.10/en/importexport.html), [and more...](https://github.com/ArchiveBox/ArchiveBox/wiki/Quickstart#2-get-your-list-of-urls-to-archive))
|
||||
- <img src="https://getpocket.com/favicon.ico" height="22px"/> [Pocket](https://getpocket.com/export), [Pinboard](https://pinboard.in/export/), [Instapaper](https://www.instapaper.com/user/export), [Shaarli](https://shaarli.readthedocs.io/en/master/Usage/#importexport), [Delicious](https://www.groovypost.com/howto/howto/export-delicious-bookmarks-xml/), [Reddit Saved](https://github.com/csu/export-saved-reddit), [Wallabag](https://doc.wallabag.org/en/user/import/wallabagv2.html), [Unmark.it](http://help.unmark.it/import-export), [OneTab](https://www.addictivetips.com/web/onetab-save-close-all-chrome-tabs-to-restore-export-or-import/), [and more...](https://github.com/ArchiveBox/ArchiveBox/wiki/Quickstart#2-get-your-list-of-urls-to-archive)
|
||||
|
||||
```bash
|
||||
echo 'http://example.com' | archivebox add
|
||||
archivebox add 'https://example.com/some/page'
|
||||
|
@ -366,12 +382,6 @@ echo 'https://example.com' | docker run -v $PWD:/data -i archivebox/archivebox a
|
|||
echo 'https://example.com' | docker-compose run -T archivebox add
|
||||
```
|
||||
|
||||
*Click these links for instructions on how to propare your links from these sources:*
|
||||
|
||||
- <img src="https://nicksweeting.com/images/rss.svg" height="22px"/> TXT, RSS, XML, JSON, CSV, SQL, HTML, Markdown, or [any other text-based format...](https://github.com/ArchiveBox/ArchiveBox/wiki/Usage#Import-a-list-of-URLs-from-a-text-file)
|
||||
- <img src="https://nicksweeting.com/images/bookmarks.png" height="22px"/> [Browser history](https://github.com/ArchiveBox/ArchiveBox/wiki/Quickstart#2-get-your-list-of-urls-to-archive) or [browser bookmarks](https://github.com/ArchiveBox/ArchiveBox/wiki/Quickstart#2-get-your-list-of-urls-to-archive) (see instructions for: [Chrome](https://support.google.com/chrome/answer/96816?hl=en), [Firefox](https://support.mozilla.org/en-US/kb/export-firefox-bookmarks-to-backup-or-transfer), [Safari](http://i.imgur.com/AtcvUZA.png), [IE](https://support.microsoft.com/en-us/help/211089/how-to-import-and-export-the-internet-explorer-favorites-folder-to-a-32-bit-version-of-windows), [Opera](http://help.opera.com/Windows/12.10/en/importexport.html), [and more...](https://github.com/ArchiveBox/ArchiveBox/wiki/Quickstart#2-get-your-list-of-urls-to-archive))
|
||||
- <img src="https://getpocket.com/favicon.ico" height="22px"/> [Pocket](https://getpocket.com/export), [Pinboard](https://pinboard.in/export/), [Instapaper](https://www.instapaper.com/user/export), [Shaarli](https://shaarli.readthedocs.io/en/master/Usage/#importexport), [Delicious](https://www.groovypost.com/howto/howto/export-delicious-bookmarks-xml/), [Reddit Saved](https://github.com/csu/export-saved-reddit), [Wallabag](https://doc.wallabag.org/en/user/import/wallabagv2.html), [Unmark.it](http://help.unmark.it/import-export), [OneTab](https://www.addictivetips.com/web/onetab-save-close-all-chrome-tabs-to-restore-export-or-import/), [and more...](https://github.com/ArchiveBox/ArchiveBox/wiki/Quickstart#2-get-your-list-of-urls-to-archive)
|
||||
|
||||
See the [Usage: CLI](https://github.com/ArchiveBox/ArchiveBox/wiki/Usage#CLI-Usage) page for documentation and examples.
|
||||
|
||||
It also includes a built-in scheduled import feature with `archivebox schedule` and browser bookmarklet, so you can pull in URLs from RSS feeds, websites, or the filesystem regularly/on-demand.
|
||||
|
@ -382,21 +392,6 @@ It also includes a built-in scheduled import feature with `archivebox schedule`
|
|||
|
||||
All of ArchiveBox's state (including the index, snapshot data, and config file) is stored in a single folder called the "ArchiveBox data folder". All `archivebox` CLI commands must be run from inside this folder, and you first create it by running `archivebox init`.
|
||||
|
||||
The on-disk layout is optimized to be easy to browse by hand and durable long-term. The main index is a standard sqlite3 database (it can also be exported as static JSON/HTML), and the archive snapshots are organized by date-added timestamp in the `archive/` subfolder. Each snapshot subfolder includes a static JSON and HTML index describing its contents, and the snapshot extrator outputs are plain files within the folder (e.g. `media/example.mp4`, `git/somerepo.git`, `static/someimage.png`, etc.)
|
||||
|
||||
```bash
|
||||
# to browse your index statically without running the archivebox server, run:
|
||||
archivebox list --html --with-headers > index.html
|
||||
archivebox list --json --with-headers > index.json
|
||||
# if running these commands with docker-compose, add -T:
|
||||
# docker-compose run -T archivebox list ...
|
||||
|
||||
# then open the static index in a browser
|
||||
open index.html
|
||||
|
||||
# or browse the snapshots via filesystem directly
|
||||
ls ./archive/<timestamp>/
|
||||
```
|
||||
|
||||
- **Index:** `index.html` & `index.json` HTML and JSON index files containing metadata and details
|
||||
- **Title**, **Favicon**, **Headers** Response headers, site favicon, and parsed site title
|
||||
|
@ -420,6 +415,22 @@ archivebox config --set YOUTUBEDL_ARGS='--max-filesize=500m'
|
|||
archivebox config --help
|
||||
```
|
||||
|
||||
The on-disk layout is optimized to be easy to browse by hand and durable long-term. The main index is a standard sqlite3 database (it can also be exported as static JSON/HTML), and the archive snapshots are organized by date-added timestamp in the `archive/` subfolder. Each snapshot subfolder includes a static JSON and HTML index describing its contents, and the snapshot extrator outputs are plain files within the folder (e.g. `media/example.mp4`, `git/somerepo.git`, `static/someimage.png`, etc.)
|
||||
|
||||
```bash
|
||||
# to browse your index statically without running the archivebox server, run:
|
||||
archivebox list --html --with-headers > index.html
|
||||
archivebox list --json --with-headers > index.json
|
||||
# if running these commands with docker-compose, add -T:
|
||||
# docker-compose run -T archivebox list ...
|
||||
|
||||
# then open the static index in a browser
|
||||
open index.html
|
||||
|
||||
# or browse the snapshots via filesystem directly
|
||||
ls ./archive/<timestamp>/
|
||||
```
|
||||
|
||||
<br/>
|
||||
|
||||
---
|
||||
|
|
Loading…
Reference in a new issue