diff --git a/Dockerfile b/Dockerfile index 609b368e..19e2bd51 100644 --- a/Dockerfile +++ b/Dockerfile @@ -20,7 +20,7 @@ RUN apt-get update && apt-get install -y curl --no-install-recommends \ # ADD https://github.com/Yelp/dumb-init/releases/download/v1.2.0/dumb-init_1.2.0_amd64 /usr/local/bin/dumb-init # RUN chmod +x /usr/local/bin/dumb-init -RUN git clone https://github.com/pirate/bookmark-archiver /home/chromeuser/app \ +RUN git clone https://github.com/pirate/ArchiveBox /home/chromeuser/app \ && pip3 install -r /home/chromeuser/app/archiver/requirements.txt # Add user so we area strong, independent chrome that don't need --no-sandbox. diff --git a/README.md b/README.md index a2ac5ae2..8c14471f 100644 --- a/README.md +++ b/README.md @@ -1,10 +1,10 @@ -# ArchiveBox: Open source local web archiving [![Github Stars](https://img.shields.io/github/stars/pirate/bookmark-archiver.svg)](https://github.com/pirate/bookmark-archiver) [![Twitter URL](https://img.shields.io/twitter/url/http/shields.io.svg?style=social)](https://twitter.com/thesquashSH) +# ArchiveBox: Open source local web archiving [![Github Stars](https://img.shields.io/github/stars/pirate/bookmark-archiver.svg)](https://github.com/pirate/ArchiveBox) [![Twitter URL](https://img.shields.io/twitter/url/http/shields.io.svg?style=social)](https://twitter.com/thesquashSH) -### (Recently [renamed](https://github.com/pirate/ArchiveBox/issues/108) from `Bookmark Archiver`) + ### (Recently [renamed](https://github.com/pirate/ArchiveBox/issues/108) from `Bookmark Archiver`) "Your own personal Way-Back Machine" -▶️ [Quickstart](#quickstart) | [Details](#details) | [Configuration](#configuration) | [Manual Setup](#manual-setup) | [Troubleshooting](#troubleshooting) | [Demo](https://archive.sweeting.me) | [Source](https://github.com/pirate/bookmark-archiver/tree/master) | [Changelog](#changelog) | [Donate](https://github.com/pirate/bookmark-archiver/blob/master/DONATE.md) +▶️ [Quickstart](#quickstart) | [Details](#details) | [Configuration](#configuration) | [Manual Setup](#manual-setup) | [Troubleshooting](#troubleshooting) | [Demo](https://archive.sweeting.me) | [Source](https://github.com/pirate/ArchiveBox/tree/master) | [Changelog](#changelog) | [Donate](https://github.com/pirate/ArchiveBox/blob/master/DONATE.md) --- @@ -62,8 +62,8 @@ Follow the links here to find instructions for exporting a list of URLs from eac **2. Create your archive:** ```bash -git clone https://github.com/pirate/bookmark-archiver -cd bookmark-archiver/ +git clone https://github.com/pirate/ArchiveBox +cd ArchiveBox/ ./setup # install all dependencies # add a list of links from a file @@ -95,8 +95,8 @@ it will keep the index up-to-date without duplicate links. This example archives a pocket RSS feed and an export file every 24 hours, and saves the output to a logfile. ```bash -0 24 * * * yourusername /opt/bookmark-archiver/archive https://getpocket.com/users/yourusername/feed/all > /var/log/bookmark_archiver_rss.log -0 24 * * * yourusername /opt/bookmark-archiver/archive /home/darth-vader/Desktop/bookmarks.html > /var/log/bookmark_archiver_firefox.log +0 24 * * * yourusername /opt/ArchiveBox/archive https://getpocket.com/users/yourusername/feed/all > /var/log/archivebox_rss.log +0 24 * * * yourusername /opt/ArchiveBox/archive /home/darth-vader/Desktop/bookmarks.html > /var/log/archivebox_firefox.log ``` (Add the above lines to `/etc/crontab`) @@ -190,13 +190,13 @@ The chrome/chromium dependency is _optional_ and only required for screenshots, The archive produced by `./archive` is suitable for serving on any provider that can host static html (e.g. github pages!). -You can also serve it from a home server or VPS by uploading the outputted `output` folder to your web directory, e.g. `/var/www/bookmark-archiver` and configuring your webserver. +You can also serve it from a home server or VPS by uploading the outputted `output` folder to your web directory, e.g. `/var/www/ArchiveBox` and configuring your webserver. Here's a sample nginx configuration that works to serve archive folders: ```nginx location / { - alias /path/to/bookmark-archiver/output/; + alias /path/to/ArchiveBox/output/; index index.html; autoindex on; # see directory listing upon clicking "The Files" links try_files $uri $uri/ =404; @@ -266,8 +266,8 @@ Follow the instruction links above in the "Quickstart" section to download your **3. Run the archive script:** -1. Clone this repo `git clone https://github.com/pirate/bookmark-archiver` -3. `cd bookmark-archiver/` +1. Clone this repo `git clone https://github.com/pirate/ArchiveBox` +3. `cd ArchiveBox/` 4. `./archive ~/Downloads/bookmarks_export.html` You may optionally specify a second argument to `archive.py export.html 153242424324` to resume the archive update at a specific timestamp. @@ -369,7 +369,7 @@ a bug in versions `<=1.19.1_1` that caused wget to fail for perfectly valid site **No links parsed from export file:** -Please open an [issue](https://github.com/pirate/bookmark-archiver/issues) with a description of where you got the export, and +Please open an [issue](https://github.com/pirate/ArchiveBox/issues) with a description of where you got the export, and preferrably your export file attached (you can redact the links). We'll fix the parser to support your format. **Lots of skipped sites:** @@ -383,12 +383,12 @@ If you're still having issues, try deleting or moving the `output/archive` folde **Lots of errors:** Make sure you have all the dependencies installed and that you're able to visit the links from your browser normally. -Open an [issue](https://github.com/pirate/bookmark-archiver/issues) with a description of the errors if you're still having problems. +Open an [issue](https://github.com/pirate/ArchiveBox/issues) with a description of the errors if you're still having problems. **Lots of broken links from the index:** Not all sites can be effectively archived with each method, that's why it's best to use a combination of `wget`, PDFs, and screenshots. -If it seems like more than 10-20% of sites in the archive are broken, open an [issue](https://github.com/pirate/bookmark-archiver/issues) +If it seems like more than 10-20% of sites in the archive are broken, open an [issue](https://github.com/pirate/ArchiveBox/issues) with some of the URLs that failed to be archived and I'll investigate. **Removing unwanted links from the index:** @@ -398,7 +398,7 @@ If you accidentally added lots of unwanted links into index and they slow down y ### Hosting the Archive If you're having issues trying to host the archive via nginx, make sure you already have nginx running with SSL. -If you don't, google around, there are plenty of tutorials to help get that set up. Open an [issue](https://github.com/pirate/bookmark-archiver/issues) +If you don't, google around, there are plenty of tutorials to help get that set up. Open an [issue](https://github.com/pirate/ArchiveBox/issues) if you have problem with a particular nginx config. @@ -468,10 +468,10 @@ If you feel like contributing a PR, some of these tasks are pretty easy. Feel f - Index links now work without nginx url rewrites, archive can now be hosted on github pages - added setup.sh script & docstrings & help commands - made Chromium the default instead of Google Chrome (yay free software) - - added [env-variable](https://github.com/pirate/bookmark-archiver/pull/25) configuration (thanks to https://github.com/hannah98!) + - added [env-variable](https://github.com/pirate/ArchiveBox/pull/25) configuration (thanks to https://github.com/hannah98!) - renamed from **Pocket Archive Stream** -> **Bookmark Archiver** - - added [Netscape-format](https://github.com/pirate/bookmark-archiver/pull/20) export support (thanks to https://github.com/ilvar!) - - added [Pinboard-format](https://github.com/pirate/bookmark-archiver/pull/7) export support (thanks to https://github.com/sconeyard!) + - added [Netscape-format](https://github.com/pirate/ArchiveBox/pull/20) export support (thanks to https://github.com/ilvar!) + - added [Pinboard-format](https://github.com/pirate/ArchiveBox/pull/7) export support (thanks to https://github.com/sconeyard!) - front-page of HN, oops! apparently I have users to support now :grin:? - added Pocket-format export support - v0.0.0 released: created Pocket Archive Stream 2017/05/05 @@ -485,4 +485,4 @@ If you feel like contributing a PR, some of these tasks are pretty easy. Feel f talented engineers. If you want to help sponsor this project long-term or just say thanks or suggest changes, contact me at bookmark-archiver@sweeting.me. - [Grants / Donations](https://github.com/pirate/bookmark-archiver/blob/master/DONATE.md) + [Grants / Donations](https://github.com/pirate/ArchiveBox/blob/master/DONATE.md) diff --git a/archiver/archive.py b/archiver/archive.py index 64aa0f25..73959c6d 100755 --- a/archiver/archive.py +++ b/archiver/archive.py @@ -1,7 +1,7 @@ #!/usr/bin/env python3 -# Bookmark Archiver +# ArchiveBox # Nick Sweeting 2017 | MIT License -# https://github.com/pirate/bookmark-archiver +# https://github.com/pirate/ArchiveBox import os import sys @@ -39,14 +39,14 @@ from util import ( __AUTHOR__ = 'Nick Sweeting ' __VERSION__ = GIT_SHA -__DESCRIPTION__ = 'Bookmark Archiver: Create a browsable html archive of a list of links.' -__DOCUMENTATION__ = 'https://github.com/pirate/bookmark-archiver' +__DESCRIPTION__ = 'ArchiveBox: Create a browsable html archive of a list of links.' +__DOCUMENTATION__ = 'https://github.com/pirate/ArchiveBox' def print_help(): print(__DESCRIPTION__) print("Documentation: {}\n".format(__DOCUMENTATION__)) print("Usage:") - print(" ./bin/bookmark-archiver ~/Downloads/bookmarks_export.html\n") + print(" ./bin/archivebox ~/Downloads/bookmarks_export.html\n") def merge_links(archive_path=OUTPUT_DIR, import_path=None, only_new=False): diff --git a/archiver/config.py b/archiver/config.py index 1fc2eb0a..a8a5f7c0 100644 --- a/archiver/config.py +++ b/archiver/config.py @@ -28,7 +28,7 @@ CHECK_SSL_VALIDITY = os.getenv('CHECK_SSL_VALIDITY', 'True' OUTPUT_PERMISSIONS = os.getenv('OUTPUT_PERMISSIONS', '755' ) CHROME_BINARY = os.getenv('CHROME_BINARY', 'chromium-browser' ) # change to google-chrome browser if using google-chrome WGET_BINARY = os.getenv('WGET_BINARY', 'wget' ) -WGET_USER_AGENT = os.getenv('WGET_USER_AGENT', 'Bookmark Archiver') +WGET_USER_AGENT = os.getenv('WGET_USER_AGENT', 'ArchiveBox') CHROME_USER_DATA_DIR = os.getenv('CHROME_USER_DATA_DIR', None) TIMEOUT = int(os.getenv('TIMEOUT', '60')) FOOTER_INFO = os.getenv('FOOTER_INFO', 'Content is hosted for personal archiving purposes only. Contact server owner for any takedown requests.',) diff --git a/archiver/index.py b/archiver/index.py index 21f68697..d8cf5b67 100644 --- a/archiver/index.py +++ b/archiver/index.py @@ -43,8 +43,8 @@ def write_json_links_index(out_dir, links): path = os.path.join(out_dir, 'index.json') index_json = { - 'info': 'Bookmark Archiver Index', - 'help': 'https://github.com/pirate/bookmark-archiver', + 'info': 'ArchiveBox Index', + 'help': 'https://github.com/pirate/ArchiveBox', 'version': GIT_SHA, 'num_links': len(links), 'updated': str(datetime.now().timestamp()), diff --git a/archiver/links.py b/archiver/links.py index f48f1a02..e544618a 100644 --- a/archiver/links.py +++ b/archiver/links.py @@ -1,5 +1,5 @@ """ -In Bookmark Archiver, a Link represents a single entry that we track in the +In ArchiveBox, a Link represents a single entry that we track in the json index. All links pass through all archiver functions and the latest, most up-to-date canonical output for each is stored in "latest". diff --git a/archiver/templates/index.html b/archiver/templates/index.html index 156f1e4d..273b69dd 100644 --- a/archiver/templates/index.html +++ b/archiver/templates/index.html @@ -110,7 +110,7 @@
- + Github @@ -143,8 +143,8 @@
- Archive created using Bookmark Archiver - version $short_git_sha   |   + Archive created using ArchiveBox + version $short_git_sha   |   Download index as JSON

$footer_info diff --git a/archiver/templates/link_index.html b/archiver/templates/link_index.html index 854fcaf2..d1bbccd5 100644 --- a/archiver/templates/link_index.html +++ b/archiver/templates/link_index.html @@ -56,7 +56,7 @@
Archive Icon - Bookmark Archiver: Link Index + ArchiveBox: Link Index diff --git a/archiver/tests/firefox_export.html b/archiver/tests/firefox_export.html index 349ffe10..99d0bd0e 100644 --- a/archiver/tests/firefox_export.html +++ b/archiver/tests/firefox_export.html @@ -21,7 +21,7 @@
firefox export bookmarks at DuckDuckGo
archive firefox bookmarks at DuckDuckGo
nodiscc (nodiscc) · GitHub -
pirate/bookmark-archiver · Github +
pirate/ArchiveBox · Github
Phonotactic Reconstruction of Encrypted VoIP Conversations
Firefox Bookmarks Archiver - gHacks Tech News

diff --git a/archiver/util.py b/archiver/util.py index e4c7dbe3..5ad51a1e 100644 --- a/archiver/util.py +++ b/archiver/util.py @@ -49,14 +49,14 @@ def check_dependencies(): python_vers = float('{}.{}'.format(sys.version_info.major, sys.version_info.minor)) if python_vers < 3.5: print('{}[X] Python version is not new enough: {} (>3.5 is required){}'.format(ANSI['red'], python_vers, ANSI['reset'])) - print(' See https://github.com/pirate/bookmark-archiver#troubleshooting for help upgrading your Python installation.') + print(' See https://github.com/pirate/ArchiveBox#troubleshooting for help upgrading your Python installation.') raise SystemExit(1) if FETCH_PDF or FETCH_SCREENSHOT or FETCH_DOM: if run(['which', CHROME_BINARY], stdout=DEVNULL).returncode: print('{}[X] Missing dependency: {}{}'.format(ANSI['red'], CHROME_BINARY, ANSI['reset'])) print(' Run ./setup.sh, then confirm it was installed with: {} --version'.format(CHROME_BINARY)) - print(' See https://github.com/pirate/bookmark-archiver for help.') + print(' See https://github.com/pirate/ArchiveBox for help.') raise SystemExit(1) # parse chrome --version e.g. Google Chrome 61.0.3114.0 canary / Chromium 59.0.3029.110 built on Ubuntu, running on Ubuntu 16.04 @@ -68,33 +68,33 @@ def check_dependencies(): if int(version) < 59: print(version_lines) print('{red}[X] Chrome version must be 59 or greater for headless PDF, screenshot, and DOM saving{reset}'.format(**ANSI)) - print(' See https://github.com/pirate/bookmark-archiver for help.') + print(' See https://github.com/pirate/ArchiveBox for help.') raise SystemExit(1) except (IndexError, TypeError, OSError): print('{red}[X] Failed to parse Chrome version, is it installed properly?{reset}'.format(**ANSI)) print(' Run ./setup.sh, then confirm it was installed with: {} --version'.format(CHROME_BINARY)) - print(' See https://github.com/pirate/bookmark-archiver for help.') + print(' See https://github.com/pirate/ArchiveBox for help.') raise SystemExit(1) if FETCH_WGET: if run(['which', 'wget'], stdout=DEVNULL).returncode or run(['wget', '--version'], stdout=DEVNULL).returncode: print('{red}[X] Missing dependency: wget{reset}'.format(**ANSI)) print(' Run ./setup.sh, then confirm it was installed with: {} --version'.format('wget')) - print(' See https://github.com/pirate/bookmark-archiver for help.') + print(' See https://github.com/pirate/ArchiveBox for help.') raise SystemExit(1) if FETCH_FAVICON or SUBMIT_ARCHIVE_DOT_ORG: if run(['which', 'curl'], stdout=DEVNULL).returncode or run(['curl', '--version'], stdout=DEVNULL).returncode: print('{red}[X] Missing dependency: curl{reset}'.format(**ANSI)) print(' Run ./setup.sh, then confirm it was installed with: {} --version'.format('curl')) - print(' See https://github.com/pirate/bookmark-archiver for help.') + print(' See https://github.com/pirate/ArchiveBox for help.') raise SystemExit(1) if FETCH_AUDIO or FETCH_VIDEO: if run(['which', 'youtube-dl'], stdout=DEVNULL).returncode or run(['youtube-dl', '--version'], stdout=DEVNULL).returncode: print('{red}[X] Missing dependency: youtube-dl{reset}'.format(**ANSI)) print(' Run ./setup.sh, then confirm it was installed with: {} --version'.format('youtube-dl')) - print(' See https://github.com/pirate/bookmark-archiver for help.') + print(' See https://github.com/pirate/ArchiveBox for help.') raise SystemExit(1) @@ -174,7 +174,7 @@ def progress(seconds=TIMEOUT, prefix=''): return end def pretty_path(path): - """convert paths like .../bookmark-archiver/archiver/../output/abc into output/abc""" + """convert paths like .../ArchiveBox/archiver/../output/abc into output/abc""" return path.replace(REPO_DIR + '/', '') @@ -319,7 +319,7 @@ def manually_merge_folders(source, target): assert answer in ('', 'a', 'b', 'q'), 'Invalid choice.' if answer == 'q': - print('\nJust run Bookmark Archiver again to pick up where you left off.') + print('\nJust run ArchiveBox again to pick up where you left off.') raise SystemExit(0) elif answer == '': return @@ -409,7 +409,7 @@ def cleanup_archive(archive_path, links): for folder, link in bad_folders: fix_folder_path(archive_path, folder, link) elif bad_folders: - print('[!] Warning! {} folders need to be merged, fix by running bookmark archiver.'.format(len(bad_folders))) + print('[!] Warning! {} folders need to be merged, fix by running ArchiveBox.'.format(len(bad_folders))) if unmatched: print('[!] Warning! {} unrecognized folders in html/archive/'.format(len(unmatched))) diff --git a/bin/bookmark-archiver b/bin/archivebox similarity index 100% rename from bin/bookmark-archiver rename to bin/archivebox diff --git a/bin/setup-bookmark-archiver b/bin/setup-archivebox similarity index 95% rename from bin/setup-bookmark-archiver rename to bin/setup-archivebox index 6163a275..3217a4b1 100755 --- a/bin/setup-bookmark-archiver +++ b/bin/setup-archivebox @@ -1,9 +1,9 @@ #!/bin/bash -# Bookmark Archiver Setup Script +# ArchiveBox Setup Script # Nick Sweeting 2017 | MIT License -# https://github.com/pirate/bookmark-archiver +# https://github.com/pirate/ArchiveBox -echo "[i] Installing bookmark-archiver dependencies. 📦" +echo "[i] Installing ArchiveBox dependencies. 📦" echo "" echo " You may be prompted for a password in order to install the following dependencies:" echo " - Chromium Browser (see README for Google-Chrome instructions instead)" @@ -84,5 +84,5 @@ echo "" echo "[X] Failed to install some dependencies! ‼️" echo " - Try the Manual Setup instructions in the README.md" echo " - Try the Troubleshooting: Dependencies instructions in the README.md" -echo " - Open an issue on github to get help: https://github.com/pirate/bookmark-archiver/issues" +echo " - Open an issue on github to get help: https://github.com/pirate/ArchiveBox/issues" exit 1