Nick Sweeting
|
950b5cbbb6
|
Merge pull request #924 from prnake/dev
improve title extractor
|
2022-05-09 18:38:12 -07:00 |
|
Nick Sweeting
|
57df65f28f
|
use yt-dlp for media archiving instead of youtube-dl
|
2022-04-21 07:11:35 -07:00 |
|
prnake
|
011bd104cb
|
remove unused import
|
2022-02-09 10:48:51 +08:00 |
|
papersnake
|
de8e22efb7
|
improve title extractor
|
2022-02-08 23:17:52 +08:00 |
|
Nick Sweeting
|
4715ace7dd
|
ignore BaseException lgtm errors
|
2021-05-31 20:59:05 -04:00 |
|
Nick Sweeting
|
eb4d3bca9d
|
Update readability.py
|
2021-05-13 00:13:32 -04:00 |
|
Nick Sweeting
|
62078a77f8
|
show run duration after each archived link in cli output
|
2021-04-10 07:52:01 -04:00 |
|
Nick Sweeting
|
193df5c8d3
|
add video subtitles and description to full-text index
|
2021-04-10 07:22:20 -04:00 |
|
Nick Sweeting
|
a9986f1f05
|
add timezone support, tons of CSS and layout improvements, more detailed snapshot admin form info, ability to sort by recently updated, better grid view styling, better table layouts, better dark mode support
|
2021-04-10 04:21:36 -04:00 |
|
Nick Sweeting
|
bd6d9c165b
|
enforce utf8 on literally all file operations because windows sucks
|
2021-03-27 01:16:29 -04:00 |
|
Nick Sweeting
|
084cf7ff51
|
add more explanation about snapshot.save timestamp bump
|
2021-02-17 13:34:46 -05:00 |
|
Nick Sweeting
|
acb932ba12
|
improve readability and mercury error handling and fix output path to be relative
|
2021-02-16 15:53:11 -05:00 |
|
Nick Sweeting
|
c95698e608
|
bump Snapshot.updated time after each extractor, change extractor order
|
2021-02-16 15:52:18 -05:00 |
|
Nick Sweeting
|
d0f8a5e710
|
change mercury atomic_write output order
|
2021-02-16 06:19:16 -05:00 |
|
Nick Sweeting
|
7d0f5653c3
|
fix lgtm alerts
|
2021-02-01 02:27:24 -05:00 |
|
Nick Sweeting
|
04c951cdd5
|
fix alerts
|
2021-02-01 02:22:02 -05:00 |
|
Nick Sweeting
|
846c966c4d
|
use globbing to find wget output path
|
2021-01-30 22:02:39 -05:00 |
|
Nick Sweeting
|
e6fa16e13a
|
only chmod wget output if it exists
|
2021-01-30 22:02:11 -05:00 |
|
Nick Sweeting
|
385daf9af8
|
save the url as title for staticfiles or non html files
|
2021-01-30 22:01:49 -05:00 |
|
Nick Sweeting
|
b9b1c3d9e8
|
fix singlefile output path not relative
|
2021-01-30 20:44:49 -05:00 |
|
Nick Sweeting
|
d6de04a83a
|
fix lgtm errors
|
2021-01-30 06:07:35 -05:00 |
|
Nick Sweeting
|
c2aaa41c76
|
fix missing str path
|
2021-01-30 01:25:08 -05:00 |
|
Nick Sweeting
|
15e58bd366
|
fix using os.path calls on pathlib paths
|
2021-01-27 11:27:40 -05:00 |
|
Nick Sweeting
|
9764a8ed9b
|
check for non html files from wget
|
2021-01-25 18:15:16 -05:00 |
|
Dan Arnfield
|
5420903102
|
Refactor should_save_extractor methods to accept overwrite parameter
|
2021-01-21 15:56:32 -06:00 |
|
Nick Sweeting
|
ef7711ffa0
|
fix cookies file arg is path
|
2021-01-20 19:13:53 -05:00 |
|
Cristian
|
6031ffa3b2
|
fix: Mercury extractor error was incorrectly initialized
|
2021-01-07 09:22:46 -05:00 |
|
Cristian
|
e9e4adfc34
|
fix: wget_output_path failing on some extractors. Add a new condition
|
2021-01-07 09:07:29 -05:00 |
|
Cristian
|
81d766aba1
|
refactor: Remove setup_django from title.py
|
2020-12-11 16:03:50 -05:00 |
|
Cristian
|
275ad22db7
|
refactor: Remove skip_index from archive related functions
|
2020-12-08 18:42:25 -05:00 |
|
Cristian
|
f6c73f9aeb
|
fix: Issue with oneshot command
|
2020-12-08 18:42:25 -05:00 |
|
JDC
|
7903db6dfb
|
Add ArchiveResult Manager and sorted indexable filter
|
2020-12-06 01:13:39 +02:00 |
|
JDC
|
b1f70b2197
|
Initial implementation
|
2020-12-06 01:12:45 +02:00 |
|
Cristian
|
33182fd53c
|
fix: Add missing assignation
|
2020-11-04 15:07:45 -05:00 |
|
Cristian
|
d064a3eeff
|
fix: Handle case when update tries to re-add a link that is not in the sql index
|
2020-11-04 15:02:54 -05:00 |
|
Cristian
|
f292cface2
|
fix: Add condition for oneshot when archiving links
|
2020-11-04 14:40:44 -05:00 |
|
Cristian
|
4484491fb7
|
feat: Create ArchiveResult after finishing an extractor process
|
2020-11-04 11:22:55 -05:00 |
|
Cristian
|
ac0ec160d1
|
lint: Fix warnings in master branch
|
2020-11-02 08:51:48 -05:00 |
|
Nick Sweeting
|
ac9e0e356d
|
config fixes
|
2020-10-31 07:57:11 -04:00 |
|
Nick Sweeting
|
18355dc2c6
|
clean up config loading in settings and config file layout
|
2020-10-31 03:08:03 -04:00 |
|
Cristian
|
e7e33ea7a5
|
tests: Add tests for several different ways to extract the title
|
2020-10-30 08:04:26 -05:00 |
|
Nick Sweeting
|
f727ece7b3
|
add regex fallback back to title parser
|
2020-10-30 04:57:31 -04:00 |
|
Nick Sweeting
|
79bef1384e
|
Merge pull request #493 from ttimasdf/feat-ogtitle
Feature: add og:title metadata as alternative title
|
2020-10-30 04:51:14 -04:00 |
|
Cristian
|
c12fe0e3d7
|
feat: Use CURL_ARGS on title extractor
|
2020-10-22 08:46:16 -05:00 |
|
Cristian
|
563d0f94ec
|
feat: Use CURL_ARGS in favicon extractor
|
2020-10-22 08:46:16 -05:00 |
|
Cristian
|
2e1cdca789
|
feat: Use CURL_ARGS on header extractor
|
2020-10-22 08:46:16 -05:00 |
|
Cristian
|
972d57bd08
|
feat: Add CURL_ARGS to control curl arguments
|
2020-10-22 08:46:16 -05:00 |
|
Cristian
|
24e7a74855
|
feat: Add WGET_ARGS to control wget arguments
|
2020-10-22 08:46:16 -05:00 |
|
Cristian
|
bc02e0ffe3
|
feat: Add config for youtubedl (YOUTUBEDL_ARGS)
|
2020-10-22 08:46:16 -05:00 |
|
Angel Rey
|
ce71747538
|
replaced os.path in init extractors
|
2020-10-02 15:46:39 -05:00 |
|