readme

2025-02-23 08:48:28 +00:00 · 2017-05-05 05:10:50 -04:00 · 2017-05-05 05:10:50 -04:00 · 6c957556e1
commit 6c957556e1
parent 206b5fc57f
1 changed files with 52 additions and 1 deletions
--- a/README.md
+++ b/README.md
@ -1,2 +1,53 @@
-# pocket-archive-stream
+# Pocket Stream Archive
+
 Save an archived copy of all websites starred using Pocket, indexed in an html file.
+
+![](screenshot.png)
+
+## Quickstart
+
+**Dependencies:** Google Chrome headless, wget
+
+```bash
+brew install Caskroom/versions/google-chrome-canary
+brew install wget
+
+# OR on linux
+
+wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | sudo apt-key add -
+sudo sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list'
+apt update; apt install google-chrome-beta
+```
+
+**Usage:**
+
+1. Download your pocket export file `ril_export.html` from https://getpocket.com/export
+2. Download this repo `git clone https://github.com/pirate/pocket-archive-stream`
+3. `cd pocket-archive-stream/`
+4. `./archive.py path/to/ril_export.html`
+
+It produces a folder `pocket/` containing an `index.html`, and archived copies of all the sites,
+organized by timestamp.  For each sites it saves:
+    - wget of site, with .html appended if not present
+    - screenshot of site using headless chrome
+    - PDF of site using headless chrome
+
+The wget archive is suitable for serving on your personal server, you can upload the pocket
+archive to `/var/www/pocket` and allow people to access your saved copies of sites.
+
+
+## Info
+
+This is basically an open-source version of [Pocket Premium](https://getpocket.com/).  I got tired of sites I saved going offline,
+or changing their URLS, so I want to archive a copy of them whenever I save them now.
+
+Now I can rest soundly knowing important articles and resources I like wont dissapear off the internet.
+
+**WARNING:**
+
+Hosting other people's site content has security implications for your domain, make sure you understand
+the dangers of hosting other people's CSS & JS files on your domain.  It's best to put this on a domain
+of its own to slightly mitigate CSRF attacks.
+
+It might also be prudent to blacklist your archive in your `robots.txt` so that search engines dont index
+the content on your domain.