ArchiveBox/archivebox/extractors
2024-01-03 20:31:46 -08:00
..
__init__.py config.py lint fixes 2023-11-14 02:07:35 -08:00
archive_org.py enforce utf8 on literally all file operations because windows sucks 2021-03-27 01:16:29 -04:00
dom.py After a timeout, chrome will leave behind a SingletonLock, which prevents future instances of chrome from starting. When an extractor fails due to a timeout, remove this file. 2023-08-28 17:27:03 +02:00
favicon.py Add FAVICON_PROVIDER option for custom favicon service 2023-05-05 20:42:36 -05:00
git.py Refactor should_save_extractor methods to accept overwrite parameter 2021-01-21 15:56:32 -06:00
headers.py Refactor should_save_extractor methods to accept overwrite parameter 2021-01-21 15:56:32 -06:00
htmltotext.py Add htmltotext extractor 2023-10-23 21:42:32 -04:00
media.py Don't be strict on unicode errors 2022-09-12 20:40:45 +00:00
mercury.py improve readability and mercury error handling and fix output path to be relative 2021-02-16 15:53:11 -05:00
pdf.py After a timeout, chrome will leave behind a SingletonLock, which prevents future instances of chrome from starting. When an extractor fails due to a timeout, remove this file. 2023-08-28 17:27:03 +02:00
readability.py tag URLs immediately once added instead of waiting until archival completes 2024-01-03 20:31:46 -08:00
screenshot.py After a timeout, chrome will leave behind a SingletonLock, which prevents future instances of chrome from starting. When an extractor fails due to a timeout, remove this file. 2023-08-28 17:27:03 +02:00
singlefile.py add CHROME_TIMEOUT args 2023-03-14 20:29:41 +09:00
title.py prefer dom dump to singlefile for generating readability output 2024-01-03 20:11:06 -08:00
wget.py add timezone support, tons of CSS and layout improvements, more detailed snapshot admin form info, ability to sort by recently updated, better grid view styling, better table layouts, better dark mode support 2021-04-10 04:21:36 -04:00