ArchiveBox/archivebox/extractors
2024-05-07 05:38:29 -07:00
..
__init__.py minor fixes 2024-02-22 04:50:22 -08:00
archive_org.py Remove unnecessary variable length args for dedupe 2024-03-05 21:13:45 -06:00
dom.py After a timeout, chrome will leave behind a SingletonLock, which prevents future instances of chrome from starting. When an extractor fails due to a timeout, remove this file. 2023-08-28 17:27:03 +02:00
favicon.py Remove unnecessary variable length args for dedupe 2024-03-05 21:13:45 -06:00
git.py Refactor should_save_extractor methods to accept overwrite parameter 2021-01-21 15:56:32 -06:00
headers.py Remove unnecessary variable length args for dedupe 2024-03-05 21:13:45 -06:00
htmltotext.py new archivebox update speed improvements 2024-02-22 04:50:22 -08:00
media.py Remove unnecessary variable length args for dedupe 2024-03-05 21:13:45 -06:00
mercury.py Add ARGS and EXTRA_ARGS for Mercury extractor 2024-03-05 21:15:38 -06:00
pdf.py After a timeout, chrome will leave behind a SingletonLock, which prevents future instances of chrome from starting. When an extractor fails due to a timeout, remove this file. 2023-08-28 17:27:03 +02:00
readability.py tag URLs immediately once added instead of waiting until archival completes 2024-01-03 20:31:46 -08:00
screenshot.py After a timeout, chrome will leave behind a SingletonLock, which prevents future instances of chrome from starting. When an extractor fails due to a timeout, remove this file. 2023-08-28 17:27:03 +02:00
singlefile.py Merge branch 'main' into dev 2024-03-26 15:01:36 -07:00
title.py Remove unnecessary variable length args for dedupe 2024-03-05 21:13:45 -06:00
wget.py more fixes for wget_output_path 2024-05-07 05:38:29 -07:00