Nick Sweeting
|
34e4b48557
|
add example js extractor
|
2024-12-12 22:15:17 -08:00 |
|
Nick Sweeting
|
5c06b8ff00
|
add new Event model to workers/models
|
2024-12-12 22:08:17 -08:00 |
|
Nick Sweeting
|
ac53fdf677
|
make chrome binary and configs directly runnable and make extractor use external bin
|
2024-12-06 02:06:39 -08:00 |
|
dish
|
f1b9aec873
|
fix syntax errors
|
2024-12-05 13:52:33 -05:00 |
|
Nick Sweeting
|
8c8ec6aff0
|
add extractors README
|
2024-12-03 02:15:17 -08:00 |
|
Nick Sweeting
|
337acdac9c
|
add base extractor class
|
2024-12-03 02:14:42 -08:00 |
|
Nick Sweeting
|
4a5d607296
|
move logging_util into archivebox.misc subfolder
|
2024-11-18 19:08:49 -08:00 |
|
Nick Sweeting
|
b3c1cb716e
|
move abx plugins inside vendor dir
|
2024-10-28 04:07:35 -07:00 |
|
Nick Sweeting
|
b3107ab830
|
move final legacy config to plugins and fix archivebox config cmd and add search opt
|
2024-10-21 02:56:00 -07:00 |
|
Nick Sweeting
|
01ba6d49d3
|
new vastly simplified plugin spec without pydantic
|
2024-10-14 21:50:47 -07:00 |
|
Nick Sweeting
|
de2ab43f7f
|
switch .is_dir and .exists for os.access to avoid PermissionError on startup
|
2024-10-08 03:02:34 -07:00 |
|
Nick Sweeting
|
cf1ea8f80f
|
improve config loading of TMP_DIR, LIB_DIR, move to separate files
|
2024-10-07 23:45:11 -07:00 |
|
Nick Sweeting
|
94123ca68c
|
fix archive_dot_org repsonse parsing bytes vs str bug
|
2024-10-01 00:18:38 -07:00 |
|
Nick Sweeting
|
18474f452b
|
move config moved out of legacy files and better version output
|
2024-09-30 23:52:00 -07:00 |
|
Nick Sweeting
|
d21bc86075
|
finish migrating almost all config to new system
|
2024-09-30 23:21:34 -07:00 |
|
Nick Sweeting
|
69522da4bb
|
move wget and mercury into plugins
|
2024-09-30 21:43:45 -07:00 |
|
Nick Sweeting
|
363a499289
|
move util.py into misc folder
|
2024-09-30 17:25:15 -07:00 |
|
Nick Sweeting
|
dfca4b13b2
|
move system.py into misc folder
|
2024-09-30 17:13:55 -07:00 |
|
Nick Sweeting
|
3e5b6ddeae
|
move config into dedicated global app
|
2024-09-30 15:59:05 -07:00 |
|
Nick Sweeting
|
bb65b2dbec
|
move almost all config into new archivebox.CONSTANTS
|
2024-09-25 05:10:09 -07:00 |
|
Nick Sweeting
|
a5ffd4e9d3
|
move pdf, screenshot, dom, singlefile, and ytdlp extractor config to new plugin system
|
2024-09-25 00:42:26 -07:00 |
|
Nick Sweeting
|
ee5bec6a10
|
flip link_archive exception throw order so real exception is easier to read at the bottom
|
2024-09-25 00:39:49 -07:00 |
|
Nick Sweeting
|
c9c163efed
|
begin migrating search backends to new plugin system
|
2024-09-24 02:13:01 -07:00 |
|
Nick Sweeting
|
52386d9c16
|
run all blocking commands in background threads and show nice UI messages as confirmation
|
2024-09-06 02:54:22 -07:00 |
|
Nick Sweeting
|
cbf2a8fdc3
|
rename datetime fields to _at, massively improve ABID generation safety and determinism
|
2024-09-04 23:42:36 -07:00 |
|
Nick Sweeting
|
d0fefc0279
|
add chunk_size=500 to more iterator calls
|
2024-08-27 19:28:00 -07:00 |
|
Nick Sweeting
|
24fe958ff3
|
massively improve Snapshot admin list view query performance
|
2024-08-26 20:16:43 -07:00 |
|
Nick Sweeting
|
9b1659c72f
|
make created_by_id autoapply to any ArchiveResults created under Snapshot
|
2024-08-20 19:43:07 -07:00 |
|
Nick Sweeting
|
0285aa52a0
|
config and attr access improvements
|
2024-08-20 18:31:21 -07:00 |
|
Nick Sweeting
|
774ce3fda7
|
fix singlefile extractor exception when result is none
|
2024-05-17 20:12:18 -07:00 |
|
Nick Sweeting
|
0420662174
|
switch everywhere to use Snapshot.pk and ArchiveResult.pk instead of id
|
2024-05-13 05:12:12 -07:00 |
|
Nick Sweeting
|
457c42bf84
|
load EXTRACTORS dynamically using importlib.import_module
|
2024-05-11 22:28:59 -07:00 |
|
Nick Sweeting
|
4c5a3fba8b
|
more fixes for wget_output_path
|
2024-05-07 05:38:29 -07:00 |
|
Nick Sweeting
|
9b21ce490e
|
add workaround logic to catch paths that are too long or contain unprintable characters
|
2024-05-07 05:03:23 -07:00 |
|
Nick Sweeting
|
f770bba3cf
|
fix OSError 36 caused by checking for path that is too long to exist
|
2024-05-07 04:12:07 -07:00 |
|
Nick Sweeting
|
b4c3aa5097
|
Merge branch 'main' into dev
|
2024-03-26 15:01:36 -07:00 |
|
Ben Muthalaly
|
f4deb97f59
|
Add ARGS and EXTRA_ARGS for Mercury extractor
|
2024-03-05 21:15:38 -06:00 |
|
Ben Muthalaly
|
d8cf09c21e
|
Remove unnecessary variable length args for dedupe
|
2024-03-05 21:13:45 -06:00 |
|
Naomi Phillips
|
a729480b75
|
Add COOKIES_FILE support for singlefile extractor
|
2024-03-03 02:32:46 -05:00 |
|
Ben Muthalaly
|
d74ddd42ae
|
Flip dedupe precedence order
|
2024-03-01 14:50:32 -06:00 |
|
Ben Muthalaly
|
ab8f395e0a
|
Add YOUTUBEDL_EXTRA_ARGS
|
2024-02-23 15:40:31 -06:00 |
|
Ben Muthalaly
|
4e69d2c9e1
|
Add EXTRA_*_ARGS for wget, curl, and singlefile
|
2024-02-22 23:04:11 -06:00 |
|
Nick Sweeting
|
8b9bc3dec8
|
minor fixes
|
2024-02-22 04:50:22 -08:00 |
|
Nick Sweeting
|
6a4e568d1b
|
new archivebox update speed improvements
|
2024-02-22 04:50:22 -08:00 |
|
Nick Sweeting
|
0a25495520
|
add fallback to check wget output dir with port stripped
|
2024-01-19 03:47:38 -08:00 |
|
Nick Sweeting
|
c1fd2cfa42
|
tag URLs immediately once added instead of waiting until archival completes
|
2024-01-03 20:31:46 -08:00 |
|
Nick Sweeting
|
db2984e47b
|
prefer dom dump to singlefile for generating readability output
|
2024-01-03 20:11:06 -08:00 |
|
Nick Sweeting
|
78d942ac22
|
show more detail in readabiliity error messages
|
2024-01-03 20:09:31 -08:00 |
|
Nick Sweeting
|
5b07a1126c
|
add comment about why DOM is preferred over singlefile for readability parsing
|
2024-01-03 19:09:24 -08:00 |
|
Nick Sweeting
|
2c54e55697
|
prefer dom dump to singlefile for generating readability output
|
2024-01-02 19:50:56 -08:00 |
|