Commit graph

46 commits

Author SHA1 Message Date
Nick Sweeting
c95698e608 bump Snapshot.updated time after each extractor, change extractor order 2021-02-16 15:52:18 -05:00
Dan Arnfield
5420903102 Refactor should_save_extractor methods to accept overwrite parameter 2021-01-21 15:56:32 -06:00
Cristian
275ad22db7 refactor: Remove skip_index from archive related functions 2020-12-08 18:42:25 -05:00
Cristian
f6c73f9aeb fix: Issue with oneshot command 2020-12-08 18:42:25 -05:00
JDC
7903db6dfb Add ArchiveResult Manager and sorted indexable filter 2020-12-06 01:13:39 +02:00
JDC
b1f70b2197 Initial implementation 2020-12-06 01:12:45 +02:00
Cristian
33182fd53c fix: Add missing assignation 2020-11-04 15:07:45 -05:00
Cristian
d064a3eeff fix: Handle case when update tries to re-add a link that is not in the sql index 2020-11-04 15:02:54 -05:00
Cristian
f292cface2 fix: Add condition for oneshot when archiving links 2020-11-04 14:40:44 -05:00
Cristian
4484491fb7 feat: Create ArchiveResult after finishing an extractor process 2020-11-04 11:22:55 -05:00
Angel Rey
ce71747538 replaced os.path in init extractors 2020-10-02 15:46:39 -05:00
Cristian
7d3767b882 fix: oneshot command not running extractors 2020-09-24 12:56:16 -05:00
Angel Rey
852e3c9cff Added headers extractor 2020-09-23 11:07:00 -05:00
ttimasdf
357b677363 fix: add mercury-parser to extractors list 2020-09-22 18:44:12 -05:00
Cristian
b18bbf8874 test: Fix tests post-rebase 2020-09-17 09:09:52 -05:00
Cristian
50f3f16203 lint: Remove unused import 2020-09-15 08:05:46 -05:00
Cristian
0a83392cbf fix: Replace any typing with Union[Iterable[Link], QuerySet] in archive_links 2020-09-15 08:05:46 -05:00
Cristian
018bd91745 refactor: Remove get_iter lambda from archive_links 2020-09-15 08:05:46 -05:00
Cristian
01fb44fd40 refactor: Change archive_links check to focus on queryset, so it allows other iterables and not just lists 2020-09-15 08:05:46 -05:00
Cristian
fe9604a772 feat: Add tests for remove command 2020-09-15 08:05:46 -05:00
Cristian
be520d137a feat: Refactor add method to use querysets 2020-09-15 08:05:46 -05:00
Cristian
874403e667 feat: Remove patch_main_index 2020-09-15 08:05:46 -05:00
Cristian
31343c1367 feat: Update extractors and add command to use sql index as source of truth 2020-09-15 08:05:46 -05:00
Nick Sweeting
e87f1d57a3 fix linters 2020-08-18 09:22:12 -04:00
Nick Sweeting
c9b3bab84d fix pull title not working 2020-08-18 08:49:26 -04:00
Nick Sweeting
b0c0a676f8 re-enable readability and singlefile by default now that its less noisy 2020-08-18 08:29:46 -04:00
Nick Sweeting
d7d53cfb12 dont show skipped extractors to reduce visual noise 2020-08-18 08:13:35 -04:00
Nick Sweeting
b681a477ae add overwrite flag to add command to force re-archiving 2020-08-18 04:37:54 -04:00
Nick Sweeting
58e928520a tweak log output for skipped methods 2020-08-14 13:12:50 -04:00
Cristian
b7aa3df8d2 feat: Disable singlefile and readability by default 2020-08-12 14:42:21 -05:00
Cristian
0ec747f64e feat: Look in wget, singlefile or dom outputs before attempting to download the information again 2020-08-11 08:37:12 -05:00
Cristian
7e2b249388 feat: Initial version of readability extractor 2020-08-11 08:37:12 -05:00
Cristian
853685668c feat: Add initial support for singlefile extractor 2020-08-03 13:22:06 -05:00
Cristian
e6c571beb2 fix: Remove title from extractors for oneshot 2020-07-31 10:24:58 -05:00
Cristian
8bcb171e74 fix: Remove support for multiple urls in oneshot command 2020-07-31 09:05:40 -05:00
Cristian
3afb2401bc fix: Add condition to avoid breaking the add command 2020-07-29 11:53:49 -05:00
Cristian
c073ea141d feat: Initial oneshot command proposal 2020-07-29 11:19:06 -05:00
Nick Sweeting
2e0b751376 accept methods argument to filder archive_link 2020-07-28 05:58:38 -04:00
Nick Sweeting
af9084ee95 update Snapshot.title to latest_title after fetching 2020-07-28 05:55:09 -04:00
Nick Sweeting
943453a9a8 pass overwrite properly 2020-07-28 05:54:42 -04:00
Cristian
a5550b2105 fix: Rename logging folder to avoid naming conflicts (and circular import issues) 2020-07-22 11:02:13 -05:00
Cristian
f4d1b5121e refactor: Move logging.py to main module to avoid circular import issues 2020-07-17 18:00:04 -05:00
Nick Sweeting
b4ce20cbe5 write link details json before and after archiving 2020-07-13 11:41:27 -04:00
Nick Sweeting
d3bfa98a91 fix depth flag and tweak logging 2020-07-13 11:26:34 -04:00
Nick Sweeting
602e141f08 fix config file atomic writing bugs 2020-06-30 02:04:16 -04:00
Nick Sweeting
1b8abc0961 move everything out of legacy folder 2019-04-27 17:26:24 -04:00