ArchiveBox

The open-source self-hosted web archive. Preserve digital content for future generations.

Features

Powerful Archiving

Save HTML, JS, PDFs, media, and more from any URL, browser history, or bookmarks. Supports extracting media and running custom scripts.

Flexible Inputs

Import links from browser history, bookmarks, Pocket, Pinboard, Instapaper, Shaarli, Wallabag, Unmark.it, Reddit Saved Posts, Mastodon Favorites, and more.

Comprehensive Archiving

Saves HTML, PDF, screenshots, media files, git repositories, and more in a self-contained filesystem-based archive for maximum durability.

Supported Inputs & Outputs

Supported Inputs

ArchiveBox can process various types of inputs

Browser bookmarks exports
Browser history exports
Pocket/Pinboard/Instapaper/etc. bookmarks exports
RSS feeds
Raw lists of URLs
Any text file containing URLs

Use Cases

Lawyers

Preserve evidence, archive case-related websites, and maintain a comprehensive digital record of online resources relevant to legal proceedings.

Journalists

Archive sources, save web pages for future reference, and create a personal database of research materials for investigative reporting.

Libraries

Build digital collections, preserve online content for academic research, and ensure long-term access to web-based resources for patrons.

Quickstart

Get Started with ArchiveBox

Follow these steps to start archiving your web content

Install ArchiveBox with pip: pip install archivebox
Create a new archive: archivebox init ~/archivebox
Add some URLs: archivebox add https://example.com
Start the web UI: archivebox server 0.0.0.0:8000

Documentation

Usage

Check out the Usage page for more details on how to use ArchiveBox effectively.

Configuration

Learn about the various configuration options in the Configuration guide.