Nick Sweeting
|
457c42bf84
|
load EXTRACTORS dynamically using importlib.import_module
|
2024-05-11 22:28:59 -07:00 |
|
Nick Sweeting
|
c1fd2cfa42
|
tag URLs immediately once added instead of waiting until archival completes
|
2024-01-03 20:31:46 -08:00 |
|
Nick Sweeting
|
78d942ac22
|
show more detail in readabiliity error messages
|
2024-01-03 20:09:31 -08:00 |
|
Nick Sweeting
|
5b07a1126c
|
add comment about why DOM is preferred over singlefile for readability parsing
|
2024-01-03 19:09:24 -08:00 |
|
Nick Sweeting
|
2c54e55697
|
prefer dom dump to singlefile for generating readability output
|
2024-01-02 19:50:56 -08:00 |
|
Nick Sweeting
|
82d8662c74
|
add more readability error output
|
2023-10-20 04:14:28 -07:00 |
|
prnake
|
011bd104cb
|
remove unused import
|
2022-02-09 10:48:51 +08:00 |
|
papersnake
|
de8e22efb7
|
improve title extractor
|
2022-02-08 23:17:52 +08:00 |
|
Nick Sweeting
|
eb4d3bca9d
|
Update readability.py
|
2021-05-13 00:13:32 -04:00 |
|
Nick Sweeting
|
a9986f1f05
|
add timezone support, tons of CSS and layout improvements, more detailed snapshot admin form info, ability to sort by recently updated, better grid view styling, better table layouts, better dark mode support
|
2021-04-10 04:21:36 -04:00 |
|
Nick Sweeting
|
bd6d9c165b
|
enforce utf8 on literally all file operations because windows sucks
|
2021-03-27 01:16:29 -04:00 |
|
Nick Sweeting
|
acb932ba12
|
improve readability and mercury error handling and fix output path to be relative
|
2021-02-16 15:53:11 -05:00 |
|
Nick Sweeting
|
d0f8a5e710
|
change mercury atomic_write output order
|
2021-02-16 06:19:16 -05:00 |
|
Dan Arnfield
|
5420903102
|
Refactor should_save_extractor methods to accept overwrite parameter
|
2021-01-21 15:56:32 -06:00 |
|
JDC
|
b1f70b2197
|
Initial implementation
|
2020-12-06 01:12:45 +02:00 |
|
Nick Sweeting
|
a645f36b87
|
add comment about fake cmd
|
2020-09-01 19:42:22 -04:00 |
|
Cristian
|
66037535fd
|
feat: Add curl command on readability as default command to debug
|
2020-09-01 10:16:24 -05:00 |
|
Cristian
|
bf3ea42141
|
fix: Add a default cmd value to handle case where the html cannot be retrieved
|
2020-08-27 09:51:33 -05:00 |
|
Nick Sweeting
|
a2c158e43e
|
catch OSErrors due to missing path
|
2020-08-18 19:09:45 -04:00 |
|
Nick Sweeting
|
7144e0bdce
|
search for node dependencies in output dir first
|
2020-08-18 18:40:19 -04:00 |
|
Nick Sweeting
|
92de20af15
|
better detect missing dependencies on startup
|
2020-08-18 04:38:13 -04:00 |
|
Cristian
|
05c71fc302
|
fix: Organize readability extractor so a timeout does not break the whole process
|
2020-08-17 08:34:40 -05:00 |
|
Nick Sweeting
|
03b73bfe77
|
Update archivebox/extractors/readability.py
|
2020-08-14 12:55:22 -04:00 |
|
Cristian
|
5dc7e63792
|
feat: Update dockerfile to support readability
|
2020-08-11 11:52:43 -05:00 |
|
Cristian
|
2a68af1b94
|
tests: Add readability tests
|
2020-08-11 11:15:15 -05:00 |
|
Cristian
|
8aa7b34de7
|
tests: Add readability to ignored methods in tests
|
2020-08-11 08:58:49 -05:00 |
|
Cristian
|
dc87d8b68c
|
tests: Update failing tests
|
2020-08-11 08:48:13 -05:00 |
|
Cristian
|
0ec747f64e
|
feat: Look in wget, singlefile or dom outputs before attempting to download the information again
|
2020-08-11 08:37:12 -05:00 |
|
Cristian
|
a14762640e
|
feat: Avoid running readability when the target is a file
|
2020-08-11 08:37:12 -05:00 |
|
Cristian
|
61e08a7c43
|
docs: Update docs link
|
2020-08-11 08:37:12 -05:00 |
|
Cristian
|
b33c66a9f7
|
feat: Split output of readability into multiple files
|
2020-08-11 08:37:12 -05:00 |
|
Cristian
|
7e2b249388
|
feat: Initial version of readability extractor
|
2020-08-11 08:37:12 -05:00 |
|