mirror of
https://github.com/ArchiveBox/ArchiveBox
synced 2024-11-10 06:34:16 +00:00
readme
This commit is contained in:
parent
206b5fc57f
commit
6c957556e1
1 changed files with 52 additions and 1 deletions
53
README.md
53
README.md
|
@ -1,2 +1,53 @@
|
|||
# pocket-archive-stream
|
||||
# Pocket Stream Archive
|
||||
|
||||
Save an archived copy of all websites starred using Pocket, indexed in an html file.
|
||||
|
||||
![](screenshot.png)
|
||||
|
||||
## Quickstart
|
||||
|
||||
**Dependencies:** Google Chrome headless, wget
|
||||
|
||||
```bash
|
||||
brew install Caskroom/versions/google-chrome-canary
|
||||
brew install wget
|
||||
|
||||
# OR on linux
|
||||
|
||||
wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | sudo apt-key add -
|
||||
sudo sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list'
|
||||
apt update; apt install google-chrome-beta
|
||||
```
|
||||
|
||||
**Usage:**
|
||||
|
||||
1. Download your pocket export file `ril_export.html` from https://getpocket.com/export
|
||||
2. Download this repo `git clone https://github.com/pirate/pocket-archive-stream`
|
||||
3. `cd pocket-archive-stream/`
|
||||
4. `./archive.py path/to/ril_export.html`
|
||||
|
||||
It produces a folder `pocket/` containing an `index.html`, and archived copies of all the sites,
|
||||
organized by timestamp. For each sites it saves:
|
||||
- wget of site, with .html appended if not present
|
||||
- screenshot of site using headless chrome
|
||||
- PDF of site using headless chrome
|
||||
|
||||
The wget archive is suitable for serving on your personal server, you can upload the pocket
|
||||
archive to `/var/www/pocket` and allow people to access your saved copies of sites.
|
||||
|
||||
|
||||
## Info
|
||||
|
||||
This is basically an open-source version of [Pocket Premium](https://getpocket.com/). I got tired of sites I saved going offline,
|
||||
or changing their URLS, so I want to archive a copy of them whenever I save them now.
|
||||
|
||||
Now I can rest soundly knowing important articles and resources I like wont dissapear off the internet.
|
||||
|
||||
**WARNING:**
|
||||
|
||||
Hosting other people's site content has security implications for your domain, make sure you understand
|
||||
the dangers of hosting other people's CSS & JS files on your domain. It's best to put this on a domain
|
||||
of its own to slightly mitigate CSRF attacks.
|
||||
|
||||
It might also be prudent to blacklist your archive in your `robots.txt` so that search engines dont index
|
||||
the content on your domain.
|
||||
|
|
Loading…
Reference in a new issue