mirror of
https://github.com/ArchiveBox/ArchiveBox
synced 2024-11-10 06:34:16 +00:00
fix README formatting for static site generator
This commit is contained in:
parent
e9490ccfeb
commit
6c288f10e5
1 changed files with 34 additions and 12 deletions
46
README.md
46
README.md
|
@ -84,6 +84,7 @@ docker-compose run archivebox help # to see more options
|
||||||
<summary><b>Get ArchiveBox with <code>docker</code> on any platform</b></summary>
|
<summary><b>Get ArchiveBox with <code>docker</code> on any platform</b></summary>
|
||||||
|
|
||||||
First make sure you have Docker installed: https://docs.docker.com/get-docker/<br/>
|
First make sure you have Docker installed: https://docs.docker.com/get-docker/<br/>
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# create a new empty directory and initalize your collection (can be anywhere)
|
# create a new empty directory and initalize your collection (can be anywhere)
|
||||||
mkdir ~/archivebox && cd ~/archivebox
|
mkdir ~/archivebox && cd ~/archivebox
|
||||||
|
@ -130,6 +131,7 @@ archivebox help # to see more options
|
||||||
```
|
```
|
||||||
|
|
||||||
For other Debian-based systems or older Ubuntu systems you can add these sources to `/etc/apt/sources.list`:
|
For other Debian-based systems or older Ubuntu systems you can add these sources to `/etc/apt/sources.list`:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
deb http://ppa.launchpad.net/archivebox/archivebox/ubuntu focal main
|
deb http://ppa.launchpad.net/archivebox/archivebox/ubuntu focal main
|
||||||
deb-src http://ppa.launchpad.net/archivebox/archivebox/ubuntu focal main
|
deb-src http://ppa.launchpad.net/archivebox/archivebox/ubuntu focal main
|
||||||
|
@ -300,6 +302,7 @@ ArchiveBox is written in Python 3 so it requires `python3` and `pip3` available
|
||||||
## Caveats
|
## Caveats
|
||||||
|
|
||||||
If you're importing URLs containing secret slugs or pages with private content (e.g Google Docs, CodiMD notepads, etc), you may want to disable some of the extractor modules to avoid leaking private URLs to 3rd party APIs during the archiving process.
|
If you're importing URLs containing secret slugs or pages with private content (e.g Google Docs, CodiMD notepads, etc), you may want to disable some of the extractor modules to avoid leaking private URLs to 3rd party APIs during the archiving process.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# don't do this:
|
# don't do this:
|
||||||
archivebox add 'https://docs.google.com/document/d/12345somelongsecrethere'
|
archivebox add 'https://docs.google.com/document/d/12345somelongsecrethere'
|
||||||
|
@ -312,6 +315,7 @@ archivebox config --set CHROME_BINARY=chromium # optional: switch to chromium t
|
||||||
```
|
```
|
||||||
|
|
||||||
Be aware that malicious archived JS can also read the contents of other pages in your archive due to snapshot CSRF and XSS protections being imperfect. See the [Security Overview](https://github.com/ArchiveBox/ArchiveBox/wiki/Security-Overview#stealth-mode) page for more details.
|
Be aware that malicious archived JS can also read the contents of other pages in your archive due to snapshot CSRF and XSS protections being imperfect. See the [Security Overview](https://github.com/ArchiveBox/ArchiveBox/wiki/Security-Overview#stealth-mode) page for more details.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# visiting an archived page with malicious JS:
|
# visiting an archived page with malicious JS:
|
||||||
https://127.0.0.1:8000/archive/1602401954/example.com/index.html
|
https://127.0.0.1:8000/archive/1602401954/example.com/index.html
|
||||||
|
@ -323,6 +327,7 @@ https://127.0.0.1:8000/archive/*
|
||||||
```
|
```
|
||||||
|
|
||||||
Support for saving multiple snapshots of each site over time will be [added soon](https://github.com/ArchiveBox/ArchiveBox/issues/179) (along with the ability to view diffs of the changes between runs). For now ArchiveBox is designed to only archive each URL with each extractor type once. A workaround to take multiple snapshots of the same URL is to make them slightly different by adding a hash:
|
Support for saving multiple snapshots of each site over time will be [added soon](https://github.com/ArchiveBox/ArchiveBox/issues/179) (along with the ability to view diffs of the changes between runs). For now ArchiveBox is designed to only archive each URL with each extractor type once. A workaround to take multiple snapshots of the same URL is to make them slightly different by adding a hash:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
archivebox add 'https://example.com#2020-10-24'
|
archivebox add 'https://example.com#2020-10-24'
|
||||||
...
|
...
|
||||||
|
@ -442,29 +447,41 @@ All contributions to ArchiveBox are welcomed! Check our [issues](https://github.
|
||||||
|
|
||||||
### Setup the dev environment
|
### Setup the dev environment
|
||||||
|
|
||||||
First, install the system dependencies from the "Bare Metal" section above.
|
#### 1. Clone the main code repo (making sure to pull the submodules as well)
|
||||||
Then you can clone the ArchiveBox repo and install
|
|
||||||
```python3
|
```bash
|
||||||
git clone https://github.com/ArchiveBox/ArchiveBox && cd ArchiveBox
|
git clone --recurse-submodules https://github.com/ArchiveBox/ArchiveBox
|
||||||
git checkout master # or the branch you want to test
|
cd ArchiveBox
|
||||||
|
git checkout dev # or the branch you want to test
|
||||||
git submodule update --init --recursive
|
git submodule update --init --recursive
|
||||||
git pull --recurse-submodules
|
git pull --recurse-submodules
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 2. Option A: Install the Python, JS, and system dependencies directly on your machine
|
||||||
|
|
||||||
|
```bash
|
||||||
# Install ArchiveBox + python dependencies
|
# Install ArchiveBox + python dependencies
|
||||||
python3 -m venv .venv && source .venv/bin/activate && pip install -e .[dev]
|
python3 -m venv .venv && source .venv/bin/activate && pip install -e '.[dev]'
|
||||||
# or with pipenv: pipenv install --dev && pipenv shell
|
# or: pipenv install --dev && pipenv shell
|
||||||
|
|
||||||
# Install node dependencies
|
# Install node dependencies
|
||||||
npm install
|
npm install
|
||||||
|
|
||||||
# Optional: install extractor dependencies manually or with helper script
|
# Check to see if anything is missing
|
||||||
|
archivebox --version
|
||||||
|
# install any missing dependencies manually, or use the helper script:
|
||||||
./bin/setup.sh
|
./bin/setup.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 2. Option B: Build the docker container and use that for development instead
|
||||||
|
|
||||||
|
```bash
|
||||||
# Optional: develop via docker by mounting the code dir into the container
|
# Optional: develop via docker by mounting the code dir into the container
|
||||||
# if you edit e.g. ./archivebox/core/models.py on the docker host, runserver
|
# if you edit e.g. ./archivebox/core/models.py on the docker host, runserver
|
||||||
# inside the container will reload and pick up your changes
|
# inside the container will reload and pick up your changes
|
||||||
docker build . -t archivebox
|
docker build . -t archivebox
|
||||||
docker run -it -p 8000:8000 \
|
docker run -it --rm archivebox version
|
||||||
|
docker run -it --rm -p 8000:8000 \
|
||||||
-v $PWD/data:/data \
|
-v $PWD/data:/data \
|
||||||
-v $PWD/archivebox:/app/archivebox \
|
-v $PWD/archivebox:/app/archivebox \
|
||||||
archivebox server 0.0.0.0:8000 --debug --reload
|
archivebox server 0.0.0.0:8000 --debug --reload
|
||||||
|
@ -495,7 +512,7 @@ You can also run all these in Docker. For more examples see the Github Actions C
|
||||||
cd archivebox/
|
cd archivebox/
|
||||||
./manage.py makemigrations
|
./manage.py makemigrations
|
||||||
|
|
||||||
cd data/
|
cd path/to/test/data/
|
||||||
archivebox shell
|
archivebox shell
|
||||||
```
|
```
|
||||||
(uses `pytest -s`)
|
(uses `pytest -s`)
|
||||||
|
@ -517,9 +534,14 @@ archivebox shell
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
./bin/release.sh
|
./bin/release.sh
|
||||||
```
|
|
||||||
(bumps the version, builds, and pushes a release to PyPI, Docker Hub, and Github Packages)
|
|
||||||
|
|
||||||
|
# or individually:
|
||||||
|
./bin/release_docs.sh
|
||||||
|
./bin/release_pip.sh
|
||||||
|
./bin/release_deb.sh
|
||||||
|
./bin/release_brew.sh
|
||||||
|
./bin/release_docker.sh
|
||||||
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
Loading…
Reference in a new issue