ふぁ
|
d77c770c47
|
add CHROME_TIMEOUT args
Signed-off-by: ふぁ <yuki@yuki0311.com>
|
2023-03-14 20:29:41 +09:00 |
|
Nick Sweeting
|
9599845b56
|
ensure DOM HTML dump is non-zero length file when retrying
|
2023-03-13 10:49:26 +00:00 |
|
Nick Sweeting
|
0cbeeb4346
|
Merge pull request #1021 from renaisun/dev
|
2023-01-09 18:17:39 -08:00 |
|
Joseph Turian
|
07de4a79a1
|
Merge branch 'dev' into feature/kludge-984-UTF8-bug
|
2022-12-20 11:39:01 +01:00 |
|
Joseph Turian
|
081a12b079
|
Add ts
|
2022-09-12 21:32:47 +00:00 |
|
Joseph Turian
|
daef48e59b
|
flake8
|
2022-09-12 21:31:33 +00:00 |
|
Joseph Turian
|
983f485cc0
|
flake8
|
2022-09-12 21:29:43 +00:00 |
|
Joseph Turian
|
b864c38d9e
|
Don't be strict on unicode errors
|
2022-09-12 20:40:45 +00:00 |
|
Joseph Turian
|
dba423a568
|
A few more youtube-dl tweaks
|
2022-09-12 20:36:23 +00:00 |
|
Joseph Turian
|
f5f7aff3b4
|
Added yt-dlp everywhere
|
2022-09-12 20:34:02 +00:00 |
|
renaisun
|
0ea955b3ed
|
add a missing comma
|
2022-09-12 09:08:28 +08:00 |
|
notevenaperson
|
40659b5e9d
|
singlefile.py: Code to ensure options are deduplicated
|
2022-09-12 09:08:28 +08:00 |
|
Joseph Turian
|
2b58cce43f
|
Attempted to warn on #984 and #1014
|
2022-09-11 12:19:16 +02:00 |
|
renaisun
|
8899fe0b92
|
Add SINGLEFILE_ARGS to control single-file arguments
|
2022-06-09 14:35:48 +08:00 |
|
Nick Sweeting
|
950b5cbbb6
|
Merge pull request #924 from prnake/dev
improve title extractor
|
2022-05-09 18:38:12 -07:00 |
|
Nick Sweeting
|
57df65f28f
|
use yt-dlp for media archiving instead of youtube-dl
|
2022-04-21 07:11:35 -07:00 |
|
prnake
|
011bd104cb
|
remove unused import
|
2022-02-09 10:48:51 +08:00 |
|
papersnake
|
de8e22efb7
|
improve title extractor
|
2022-02-08 23:17:52 +08:00 |
|
Nick Sweeting
|
4715ace7dd
|
ignore BaseException lgtm errors
|
2021-05-31 20:59:05 -04:00 |
|
Nick Sweeting
|
eb4d3bca9d
|
Update readability.py
|
2021-05-13 00:13:32 -04:00 |
|
Nick Sweeting
|
62078a77f8
|
show run duration after each archived link in cli output
|
2021-04-10 07:52:01 -04:00 |
|
Nick Sweeting
|
193df5c8d3
|
add video subtitles and description to full-text index
|
2021-04-10 07:22:20 -04:00 |
|
Nick Sweeting
|
a9986f1f05
|
add timezone support, tons of CSS and layout improvements, more detailed snapshot admin form info, ability to sort by recently updated, better grid view styling, better table layouts, better dark mode support
|
2021-04-10 04:21:36 -04:00 |
|
Nick Sweeting
|
bd6d9c165b
|
enforce utf8 on literally all file operations because windows sucks
|
2021-03-27 01:16:29 -04:00 |
|
Nick Sweeting
|
084cf7ff51
|
add more explanation about snapshot.save timestamp bump
|
2021-02-17 13:34:46 -05:00 |
|
Nick Sweeting
|
acb932ba12
|
improve readability and mercury error handling and fix output path to be relative
|
2021-02-16 15:53:11 -05:00 |
|
Nick Sweeting
|
c95698e608
|
bump Snapshot.updated time after each extractor, change extractor order
|
2021-02-16 15:52:18 -05:00 |
|
Nick Sweeting
|
d0f8a5e710
|
change mercury atomic_write output order
|
2021-02-16 06:19:16 -05:00 |
|
Nick Sweeting
|
7d0f5653c3
|
fix lgtm alerts
|
2021-02-01 02:27:24 -05:00 |
|
Nick Sweeting
|
04c951cdd5
|
fix alerts
|
2021-02-01 02:22:02 -05:00 |
|
Nick Sweeting
|
846c966c4d
|
use globbing to find wget output path
|
2021-01-30 22:02:39 -05:00 |
|
Nick Sweeting
|
e6fa16e13a
|
only chmod wget output if it exists
|
2021-01-30 22:02:11 -05:00 |
|
Nick Sweeting
|
385daf9af8
|
save the url as title for staticfiles or non html files
|
2021-01-30 22:01:49 -05:00 |
|
Nick Sweeting
|
b9b1c3d9e8
|
fix singlefile output path not relative
|
2021-01-30 20:44:49 -05:00 |
|
Nick Sweeting
|
d6de04a83a
|
fix lgtm errors
|
2021-01-30 06:07:35 -05:00 |
|
Nick Sweeting
|
c2aaa41c76
|
fix missing str path
|
2021-01-30 01:25:08 -05:00 |
|
Nick Sweeting
|
15e58bd366
|
fix using os.path calls on pathlib paths
|
2021-01-27 11:27:40 -05:00 |
|
Nick Sweeting
|
9764a8ed9b
|
check for non html files from wget
|
2021-01-25 18:15:16 -05:00 |
|
Dan Arnfield
|
5420903102
|
Refactor should_save_extractor methods to accept overwrite parameter
|
2021-01-21 15:56:32 -06:00 |
|
Nick Sweeting
|
ef7711ffa0
|
fix cookies file arg is path
|
2021-01-20 19:13:53 -05:00 |
|
Cristian
|
6031ffa3b2
|
fix: Mercury extractor error was incorrectly initialized
|
2021-01-07 09:22:46 -05:00 |
|
Cristian
|
e9e4adfc34
|
fix: wget_output_path failing on some extractors. Add a new condition
|
2021-01-07 09:07:29 -05:00 |
|
Cristian
|
81d766aba1
|
refactor: Remove setup_django from title.py
|
2020-12-11 16:03:50 -05:00 |
|
Cristian
|
275ad22db7
|
refactor: Remove skip_index from archive related functions
|
2020-12-08 18:42:25 -05:00 |
|
Cristian
|
f6c73f9aeb
|
fix: Issue with oneshot command
|
2020-12-08 18:42:25 -05:00 |
|
JDC
|
7903db6dfb
|
Add ArchiveResult Manager and sorted indexable filter
|
2020-12-06 01:13:39 +02:00 |
|
JDC
|
b1f70b2197
|
Initial implementation
|
2020-12-06 01:12:45 +02:00 |
|
Cristian
|
33182fd53c
|
fix: Add missing assignation
|
2020-11-04 15:07:45 -05:00 |
|
Cristian
|
d064a3eeff
|
fix: Handle case when update tries to re-add a link that is not in the sql index
|
2020-11-04 15:02:54 -05:00 |
|
Cristian
|
f292cface2
|
fix: Add condition for oneshot when archiving links
|
2020-11-04 14:40:44 -05:00 |
|
Cristian
|
4484491fb7
|
feat: Create ArchiveResult after finishing an extractor process
|
2020-11-04 11:22:55 -05:00 |
|
Cristian
|
ac0ec160d1
|
lint: Fix warnings in master branch
|
2020-11-02 08:51:48 -05:00 |
|
Nick Sweeting
|
ac9e0e356d
|
config fixes
|
2020-10-31 07:57:11 -04:00 |
|
Nick Sweeting
|
18355dc2c6
|
clean up config loading in settings and config file layout
|
2020-10-31 03:08:03 -04:00 |
|
Cristian
|
e7e33ea7a5
|
tests: Add tests for several different ways to extract the title
|
2020-10-30 08:04:26 -05:00 |
|
Nick Sweeting
|
f727ece7b3
|
add regex fallback back to title parser
|
2020-10-30 04:57:31 -04:00 |
|
Nick Sweeting
|
79bef1384e
|
Merge pull request #493 from ttimasdf/feat-ogtitle
Feature: add og:title metadata as alternative title
|
2020-10-30 04:51:14 -04:00 |
|
Cristian
|
c12fe0e3d7
|
feat: Use CURL_ARGS on title extractor
|
2020-10-22 08:46:16 -05:00 |
|
Cristian
|
563d0f94ec
|
feat: Use CURL_ARGS in favicon extractor
|
2020-10-22 08:46:16 -05:00 |
|
Cristian
|
2e1cdca789
|
feat: Use CURL_ARGS on header extractor
|
2020-10-22 08:46:16 -05:00 |
|
Cristian
|
972d57bd08
|
feat: Add CURL_ARGS to control curl arguments
|
2020-10-22 08:46:16 -05:00 |
|
Cristian
|
24e7a74855
|
feat: Add WGET_ARGS to control wget arguments
|
2020-10-22 08:46:16 -05:00 |
|
Cristian
|
bc02e0ffe3
|
feat: Add config for youtubedl (YOUTUBEDL_ARGS)
|
2020-10-22 08:46:16 -05:00 |
|
Angel Rey
|
ce71747538
|
replaced os.path in init extractors
|
2020-10-02 15:46:39 -05:00 |
|
Angel Rey
|
3fb410a604
|
Replaced os.path in favicon.py
|
2020-10-02 15:46:39 -05:00 |
|
ttimasdf
|
eda3836dee
|
feat: add og:title metadata as alternative title
|
2020-09-27 12:54:52 +08:00 |
|
Cristian
|
abde871a3c
|
fix: Wget absolute path generating issues
|
2020-09-25 08:24:06 -05:00 |
|
Cristian
|
7d3767b882
|
fix: oneshot command not running extractors
|
2020-09-24 12:56:16 -05:00 |
|
Cristian
|
62ed11a5ca
|
fix: Improve headers handling
|
2020-09-24 12:55:51 -05:00 |
|
Angel Rey
|
a40af98ced
|
removed static file check
|
2020-09-24 12:55:51 -05:00 |
|
Angel Rey
|
dc160daba8
|
Fixed lint
|
2020-09-23 11:07:00 -05:00 |
|
Angel Rey
|
7fd7dced9a
|
Added curl params
|
2020-09-23 11:07:00 -05:00 |
|
Angel Rey
|
852e3c9cff
|
Added headers extractor
|
2020-09-23 11:07:00 -05:00 |
|
Cristian
|
eb34a6af62
|
lint: Fix mercury extractor lint issues
|
2020-09-23 10:35:39 -05:00 |
|
Cristian
|
46b9e3d536
|
fix: Fix mercury extractor test
|
2020-09-23 10:34:05 -05:00 |
|
ttimasdf
|
357b677363
|
fix: add mercury-parser to extractors list
|
2020-09-22 18:44:12 -05:00 |
|
ttimasdf
|
706bd895e0
|
feat: Add mercury-parser
|
2020-09-22 18:44:12 -05:00 |
|
Cristian
|
b18bbf8874
|
test: Fix tests post-rebase
|
2020-09-17 09:09:52 -05:00 |
|
Cristian
|
50f3f16203
|
lint: Remove unused import
|
2020-09-15 08:05:46 -05:00 |
|
Cristian
|
0a83392cbf
|
fix: Replace any typing with Union[Iterable[Link], QuerySet] in archive_links
|
2020-09-15 08:05:46 -05:00 |
|
Cristian
|
018bd91745
|
refactor: Remove get_iter lambda from archive_links
|
2020-09-15 08:05:46 -05:00 |
|
Cristian
|
01fb44fd40
|
refactor: Change archive_links check to focus on queryset, so it allows other iterables and not just lists
|
2020-09-15 08:05:46 -05:00 |
|
Cristian
|
fe9604a772
|
feat: Add tests for remove command
|
2020-09-15 08:05:46 -05:00 |
|
Cristian
|
be520d137a
|
feat: Refactor add method to use querysets
|
2020-09-15 08:05:46 -05:00 |
|
Cristian
|
874403e667
|
feat: Remove patch_main_index
|
2020-09-15 08:05:46 -05:00 |
|
Cristian
|
31343c1367
|
feat: Update extractors and add command to use sql index as source of truth
|
2020-09-15 08:05:46 -05:00 |
|
Cristian
|
bd3c824d45
|
fix: Escape JSON output on command failure so the user can run the command manually
|
2020-09-04 10:23:41 -05:00 |
|
Nick Sweeting
|
a645f36b87
|
add comment about fake cmd
|
2020-09-01 19:42:22 -04:00 |
|
Cristian
|
66037535fd
|
feat: Add curl command on readability as default command to debug
|
2020-09-01 10:16:24 -05:00 |
|
Cristian
|
bf3ea42141
|
fix: Add a default cmd value to handle case where the html cannot be retrieved
|
2020-08-27 09:51:33 -05:00 |
|
Nick Sweeting
|
a2c158e43e
|
catch OSErrors due to missing path
|
2020-08-18 19:09:45 -04:00 |
|
Nick Sweeting
|
7144e0bdce
|
search for node dependencies in output dir first
|
2020-08-18 18:40:19 -04:00 |
|
Nick Sweeting
|
e87f1d57a3
|
fix linters
|
2020-08-18 09:22:12 -04:00 |
|
Nick Sweeting
|
c9b3bab84d
|
fix pull title not working
|
2020-08-18 08:49:26 -04:00 |
|
Nick Sweeting
|
b0c0a676f8
|
re-enable readability and singlefile by default now that its less noisy
|
2020-08-18 08:29:46 -04:00 |
|
Nick Sweeting
|
d7d53cfb12
|
dont show skipped extractors to reduce visual noise
|
2020-08-18 08:13:35 -04:00 |
|
Nick Sweeting
|
92de20af15
|
better detect missing dependencies on startup
|
2020-08-18 04:38:13 -04:00 |
|
Nick Sweeting
|
b681a477ae
|
add overwrite flag to add command to force re-archiving
|
2020-08-18 04:37:54 -04:00 |
|
Cristian
|
05c71fc302
|
fix: Organize readability extractor so a timeout does not break the whole process
|
2020-08-17 08:34:40 -05:00 |
|
Nick Sweeting
|
58e928520a
|
tweak log output for skipped methods
|
2020-08-14 13:12:50 -04:00 |
|