Nick Sweeting
f59b6d4189
only add url-list lines that are real urls
2021-04-01 14:00:07 -04:00
Nick Sweeting
5d3a03b299
use stderr and hint in case of parser returning no urls instead of bare exception
2021-03-31 01:39:01 -04:00
Nick Sweeting
8ce93ff787
use KEY, NAME, and PARSER to define parsers instead of hardcoding in init
2021-03-31 01:05:49 -04:00
Nick Sweeting
36f0646501
Merge pull request #669 from FliegendeWurst/fix-issue-235
...
add command: --parser option (fixes #235 )
2021-03-31 00:53:47 -04:00
FliegendeWurst
60bd9a902e
add command: --parser option
2021-03-28 10:09:11 +02:00
Nick Sweeting
5fb9ca389f
check more url parsing invariants on startup
2021-03-27 03:57:22 -04:00
Nick Sweeting
d6de04a83a
fix lgtm errors
2021-01-30 06:07:35 -05:00
Nick Sweeting
a0a79cead8
move utils and vendored libs into subfolders
2020-12-06 02:01:18 +02:00
mAAdhaTTah
ac7ad9e942
Add parser for Pocket API
...
Pass a url like `pocket://Username` to import that username's archived Pocket
library. Tokens need to be stored in ArchveBox.conf with the following keys:
```
POCKET_CONSUMER_KEY = key-from-custom-pocket-app
POCKET_ACCESS_TOKENS = {"YourUsername": "pocket-token-for-app"}
```
`POCKET_ACCESS_TOKENS` MUST be on a single line, or the JSON will be
misinterpreted by the parser as a new key/value pair.
2020-12-04 22:54:39 -05:00
Emmanuel Hainry
aebc83659d
Add parser for Wallabag Atom feeds
2020-10-18 11:20:07 +02:00
Angel Rey
2c62abb270
Replaced os.path in init parsers
2020-10-02 15:46:39 -05:00
apkallum
594d9e49ce
first attempt to migrate to Pathlib
2020-09-17 09:09:52 -05:00
Nick Sweeting
61ab952dab
fix parser docstring
2020-08-18 09:20:05 -04:00
Nick Sweeting
15efb2d5ed
new generic_html parser for extracting hrefs
2020-08-18 08:29:05 -04:00
Nick Sweeting
a682a9c478
make all parsers accept arbitrary meta kwargs
2020-08-18 08:27:47 -04:00
Nick Sweeting
2e2b4f8150
fix url is too long to be a path error
2020-08-18 08:23:57 -04:00
Nick Sweeting
e3ac4c2405
htmldecode downloaded sources before parsing for links
2020-08-18 08:23:20 -04:00
Cristian
c073ea141d
feat: Initial oneshot command proposal
2020-07-29 11:19:06 -05:00
Nick Sweeting
3fe7a9b70c
also parse and archive sub-urls in generic_txt input
2020-07-27 18:52:57 -04:00
Cristian
6006b4f93b
refactor: Organize code to remove flake8 issues
2020-07-24 12:25:25 -05:00
Cristian
a5550b2105
fix: Rename logging folder to avoid naming conflicts (and circular import issues)
2020-07-22 11:02:13 -05:00
Cristian
f4d1b5121e
refactor: Move logging.py to main module to avoid circular import issues
2020-07-17 18:00:04 -05:00
Nick Sweeting
d3bfa98a91
fix depth flag and tweak logging
2020-07-13 11:26:34 -04:00
Nick Sweeting
96b1e4a8ec
accept local paths as valid link URLs when parsing
2020-07-13 11:22:58 -04:00
Nick Sweeting
cb67b09f9d
Merge branch 'master' into django
2020-06-25 21:30:29 -04:00
Nick Sweeting
204de37eb9
fix parsing errors for older archive index formats
2019-05-01 02:28:48 -04:00
Nick Sweeting
95007d9137
split up utils into separate files
2019-04-30 23:13:04 -04:00
Nick Sweeting
1b8abc0961
move everything out of legacy folder
2019-04-27 17:26:24 -04:00