fix README formatting for static site generator

This commit is contained in:
Nick Sweeting 2021-01-19 22:02:35 -05:00 committed by GitHub
parent e9490ccfeb
commit 6c288f10e5
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23

View file

@ -84,6 +84,7 @@ docker-compose run archivebox help # to see more options
<summary><b>Get ArchiveBox with <code>docker</code> on any platform</b></summary> <summary><b>Get ArchiveBox with <code>docker</code> on any platform</b></summary>
First make sure you have Docker installed: https://docs.docker.com/get-docker/<br/> First make sure you have Docker installed: https://docs.docker.com/get-docker/<br/>
```bash ```bash
# create a new empty directory and initalize your collection (can be anywhere) # create a new empty directory and initalize your collection (can be anywhere)
mkdir ~/archivebox && cd ~/archivebox mkdir ~/archivebox && cd ~/archivebox
@ -130,6 +131,7 @@ archivebox help # to see more options
``` ```
For other Debian-based systems or older Ubuntu systems you can add these sources to `/etc/apt/sources.list`: For other Debian-based systems or older Ubuntu systems you can add these sources to `/etc/apt/sources.list`:
```bash ```bash
deb http://ppa.launchpad.net/archivebox/archivebox/ubuntu focal main deb http://ppa.launchpad.net/archivebox/archivebox/ubuntu focal main
deb-src http://ppa.launchpad.net/archivebox/archivebox/ubuntu focal main deb-src http://ppa.launchpad.net/archivebox/archivebox/ubuntu focal main
@ -300,6 +302,7 @@ ArchiveBox is written in Python 3 so it requires `python3` and `pip3` available
## Caveats ## Caveats
If you're importing URLs containing secret slugs or pages with private content (e.g Google Docs, CodiMD notepads, etc), you may want to disable some of the extractor modules to avoid leaking private URLs to 3rd party APIs during the archiving process. If you're importing URLs containing secret slugs or pages with private content (e.g Google Docs, CodiMD notepads, etc), you may want to disable some of the extractor modules to avoid leaking private URLs to 3rd party APIs during the archiving process.
```bash ```bash
# don't do this: # don't do this:
archivebox add 'https://docs.google.com/document/d/12345somelongsecrethere' archivebox add 'https://docs.google.com/document/d/12345somelongsecrethere'
@ -312,6 +315,7 @@ archivebox config --set CHROME_BINARY=chromium # optional: switch to chromium t
``` ```
Be aware that malicious archived JS can also read the contents of other pages in your archive due to snapshot CSRF and XSS protections being imperfect. See the [Security Overview](https://github.com/ArchiveBox/ArchiveBox/wiki/Security-Overview#stealth-mode) page for more details. Be aware that malicious archived JS can also read the contents of other pages in your archive due to snapshot CSRF and XSS protections being imperfect. See the [Security Overview](https://github.com/ArchiveBox/ArchiveBox/wiki/Security-Overview#stealth-mode) page for more details.
```bash ```bash
# visiting an archived page with malicious JS: # visiting an archived page with malicious JS:
https://127.0.0.1:8000/archive/1602401954/example.com/index.html https://127.0.0.1:8000/archive/1602401954/example.com/index.html
@ -323,6 +327,7 @@ https://127.0.0.1:8000/archive/*
``` ```
Support for saving multiple snapshots of each site over time will be [added soon](https://github.com/ArchiveBox/ArchiveBox/issues/179) (along with the ability to view diffs of the changes between runs). For now ArchiveBox is designed to only archive each URL with each extractor type once. A workaround to take multiple snapshots of the same URL is to make them slightly different by adding a hash: Support for saving multiple snapshots of each site over time will be [added soon](https://github.com/ArchiveBox/ArchiveBox/issues/179) (along with the ability to view diffs of the changes between runs). For now ArchiveBox is designed to only archive each URL with each extractor type once. A workaround to take multiple snapshots of the same URL is to make them slightly different by adding a hash:
```bash ```bash
archivebox add 'https://example.com#2020-10-24' archivebox add 'https://example.com#2020-10-24'
... ...
@ -442,29 +447,41 @@ All contributions to ArchiveBox are welcomed! Check our [issues](https://github.
### Setup the dev environment ### Setup the dev environment
First, install the system dependencies from the "Bare Metal" section above. #### 1. Clone the main code repo (making sure to pull the submodules as well)
Then you can clone the ArchiveBox repo and install
```python3 ```bash
git clone https://github.com/ArchiveBox/ArchiveBox && cd ArchiveBox git clone --recurse-submodules https://github.com/ArchiveBox/ArchiveBox
git checkout master # or the branch you want to test cd ArchiveBox
git checkout dev # or the branch you want to test
git submodule update --init --recursive git submodule update --init --recursive
git pull --recurse-submodules git pull --recurse-submodules
```
#### 2. Option A: Install the Python, JS, and system dependencies directly on your machine
```bash
# Install ArchiveBox + python dependencies # Install ArchiveBox + python dependencies
python3 -m venv .venv && source .venv/bin/activate && pip install -e .[dev] python3 -m venv .venv && source .venv/bin/activate && pip install -e '.[dev]'
# or with pipenv: pipenv install --dev && pipenv shell # or: pipenv install --dev && pipenv shell
# Install node dependencies # Install node dependencies
npm install npm install
# Optional: install extractor dependencies manually or with helper script # Check to see if anything is missing
archivebox --version
# install any missing dependencies manually, or use the helper script:
./bin/setup.sh ./bin/setup.sh
```
#### 2. Option B: Build the docker container and use that for development instead
```bash
# Optional: develop via docker by mounting the code dir into the container # Optional: develop via docker by mounting the code dir into the container
# if you edit e.g. ./archivebox/core/models.py on the docker host, runserver # if you edit e.g. ./archivebox/core/models.py on the docker host, runserver
# inside the container will reload and pick up your changes # inside the container will reload and pick up your changes
docker build . -t archivebox docker build . -t archivebox
docker run -it -p 8000:8000 \ docker run -it --rm archivebox version
docker run -it --rm -p 8000:8000 \
-v $PWD/data:/data \ -v $PWD/data:/data \
-v $PWD/archivebox:/app/archivebox \ -v $PWD/archivebox:/app/archivebox \
archivebox server 0.0.0.0:8000 --debug --reload archivebox server 0.0.0.0:8000 --debug --reload
@ -495,7 +512,7 @@ You can also run all these in Docker. For more examples see the Github Actions C
cd archivebox/ cd archivebox/
./manage.py makemigrations ./manage.py makemigrations
cd data/ cd path/to/test/data/
archivebox shell archivebox shell
``` ```
(uses `pytest -s`) (uses `pytest -s`)
@ -517,9 +534,14 @@ archivebox shell
```bash ```bash
./bin/release.sh ./bin/release.sh
```
(bumps the version, builds, and pushes a release to PyPI, Docker Hub, and Github Packages)
# or individually:
./bin/release_docs.sh
./bin/release_pip.sh
./bin/release_deb.sh
./bin/release_brew.sh
./bin/release_docker.sh
```
--- ---