ArchiveBox
The open-source self-hosted web archive. Preserve digital content for future generations.
Features
Powerful Archiving
Save HTML, JS, PDFs, media, and more from any URL, browser history, or bookmarks. Supports extracting media and running custom scripts.
Flexible Inputs
Import links from browser history, bookmarks, Pocket, Pinboard, Instapaper, Shaarli, Wallabag, Unmark.it, Reddit Saved Posts, Mastodon Favorites, and more.
Comprehensive Archiving
Saves HTML, PDF, screenshots, media files, git repositories, and more in a self-contained filesystem-based archive for maximum durability.
Supported Inputs & Outputs
Supported Inputs
ArchiveBox can process various types of inputs
- Browser bookmarks exports
- Browser history exports
- Pocket/Pinboard/Instapaper/etc. bookmarks exports
- RSS feeds
- Raw lists of URLs
- Any text file containing URLs
Use Cases
Lawyers
Preserve evidence, archive case-related websites, and maintain a comprehensive digital record of online resources relevant to legal proceedings.
Journalists
Archive sources, save web pages for future reference, and create a personal database of research materials for investigative reporting.
Libraries
Build digital collections, preserve online content for academic research, and ensure long-term access to web-based resources for patrons.
Quickstart
Get Started with ArchiveBox
Follow these steps to start archiving your web content
- Install ArchiveBox with pip:
pip install archivebox
- Create a new archive:
archivebox init ~/archivebox
- Add some URLs:
archivebox add https://example.com
- Start the web UI:
archivebox server 0.0.0.0:8000
Documentation
Usage
Check out the Usage page for more details on how to use ArchiveBox effectively.
Configuration
Learn about the various configuration options in the Configuration guide.