Commit graph

64 commits

Author SHA1 Message Date
Cristian
018bd91745 refactor: Remove get_iter lambda from archive_links 2020-09-15 08:05:46 -05:00
Cristian
01fb44fd40 refactor: Change archive_links check to focus on queryset, so it allows other iterables and not just lists 2020-09-15 08:05:46 -05:00
Cristian
fe9604a772 feat: Add tests for remove command 2020-09-15 08:05:46 -05:00
Cristian
be520d137a feat: Refactor add method to use querysets 2020-09-15 08:05:46 -05:00
Cristian
874403e667 feat: Remove patch_main_index 2020-09-15 08:05:46 -05:00
Cristian
31343c1367 feat: Update extractors and add command to use sql index as source of truth 2020-09-15 08:05:46 -05:00
Cristian
bd3c824d45 fix: Escape JSON output on command failure so the user can run the command manually 2020-09-04 10:23:41 -05:00
Nick Sweeting
a645f36b87
add comment about fake cmd 2020-09-01 19:42:22 -04:00
Cristian
66037535fd feat: Add curl command on readability as default command to debug 2020-09-01 10:16:24 -05:00
Cristian
bf3ea42141 fix: Add a default cmd value to handle case where the html cannot be retrieved 2020-08-27 09:51:33 -05:00
Nick Sweeting
a2c158e43e catch OSErrors due to missing path 2020-08-18 19:09:45 -04:00
Nick Sweeting
7144e0bdce search for node dependencies in output dir first 2020-08-18 18:40:19 -04:00
Nick Sweeting
e87f1d57a3 fix linters 2020-08-18 09:22:12 -04:00
Nick Sweeting
c9b3bab84d fix pull title not working 2020-08-18 08:49:26 -04:00
Nick Sweeting
b0c0a676f8 re-enable readability and singlefile by default now that its less noisy 2020-08-18 08:29:46 -04:00
Nick Sweeting
d7d53cfb12 dont show skipped extractors to reduce visual noise 2020-08-18 08:13:35 -04:00
Nick Sweeting
92de20af15 better detect missing dependencies on startup 2020-08-18 04:38:13 -04:00
Nick Sweeting
b681a477ae add overwrite flag to add command to force re-archiving 2020-08-18 04:37:54 -04:00
Cristian
05c71fc302 fix: Organize readability extractor so a timeout does not break the whole process 2020-08-17 08:34:40 -05:00
Nick Sweeting
58e928520a tweak log output for skipped methods 2020-08-14 13:12:50 -04:00
Nick Sweeting
03b73bfe77
Update archivebox/extractors/readability.py 2020-08-14 12:55:22 -04:00
Cristian
b7aa3df8d2 feat: Disable singlefile and readability by default 2020-08-12 14:42:21 -05:00
Cristian
5dc7e63792 feat: Update dockerfile to support readability 2020-08-11 11:52:43 -05:00
Cristian
2a68af1b94 tests: Add readability tests 2020-08-11 11:15:15 -05:00
Cristian
8aa7b34de7 tests: Add readability to ignored methods in tests 2020-08-11 08:58:49 -05:00
Cristian
dc87d8b68c tests: Update failing tests 2020-08-11 08:48:13 -05:00
Cristian
0ec747f64e feat: Look in wget, singlefile or dom outputs before attempting to download the information again 2020-08-11 08:37:12 -05:00
Cristian
a14762640e feat: Avoid running readability when the target is a file 2020-08-11 08:37:12 -05:00
Cristian
61e08a7c43 docs: Update docs link 2020-08-11 08:37:12 -05:00
Cristian
b33c66a9f7 feat: Split output of readability into multiple files 2020-08-11 08:37:12 -05:00
Cristian
7e2b249388 feat: Initial version of readability extractor 2020-08-11 08:37:12 -05:00
Nick Sweeting
430be7bc68 add missing staticfile check to singlefile 2020-08-10 13:42:20 -04:00
Cristian
06d0e9de6c feat: Add support for singlefile in docker 2020-08-03 13:23:05 -05:00
Nick Sweeting
5b6eb5e4ad make filenames consistent with program name 2020-08-03 13:23:05 -05:00
Cristian
42b0c80465 feat: Add singlefile to link_details 2020-08-03 13:22:06 -05:00
Cristian
787a5ad43e fix: Commit code review suggestions 2020-08-03 13:22:06 -05:00
Cristian
853685668c feat: Add initial support for singlefile extractor 2020-08-03 13:22:06 -05:00
Cristian
e6c571beb2 fix: Remove title from extractors for oneshot 2020-07-31 10:24:58 -05:00
Cristian
8bcb171e74 fix: Remove support for multiple urls in oneshot command 2020-07-31 09:05:40 -05:00
Cristian
3afb2401bc fix: Add condition to avoid breaking the add command 2020-07-29 11:53:49 -05:00
Cristian
c073ea141d feat: Initial oneshot command proposal 2020-07-29 11:19:06 -05:00
Nick Sweeting
2e0b751376 accept methods argument to filder archive_link 2020-07-28 05:58:38 -04:00
Nick Sweeting
032c2458de add missing setup_django import 2020-07-28 05:58:13 -04:00
Nick Sweeting
55a237a435 also set snapshot title inside of fetch_title directly 2020-07-28 05:56:34 -04:00
Nick Sweeting
273059f054 accept gzipped responses when using curl 2020-07-28 05:55:54 -04:00
Nick Sweeting
af9084ee95 update Snapshot.title to latest_title after fetching 2020-07-28 05:55:09 -04:00
Nick Sweeting
943453a9a8 pass overwrite properly 2020-07-28 05:54:42 -04:00
Cristian
a5550b2105 fix: Rename logging folder to avoid naming conflicts (and circular import issues) 2020-07-22 11:02:13 -05:00
Nick Sweeting
0965031d8f fix archive_org header rename 2020-07-22 01:46:38 -04:00
Cristian
f4d1b5121e refactor: Move logging.py to main module to avoid circular import issues 2020-07-17 18:00:04 -05:00