Commit graph

17 commits

Author SHA1 Message Date
apkallum
594d9e49ce first attempt to migrate to Pathlib 2020-09-17 09:09:52 -05:00
Nick Sweeting
61ab952dab fix parser docstring 2020-08-18 09:20:05 -04:00
Nick Sweeting
15efb2d5ed new generic_html parser for extracting hrefs 2020-08-18 08:29:05 -04:00
Nick Sweeting
a682a9c478 make all parsers accept arbitrary meta kwargs 2020-08-18 08:27:47 -04:00
Nick Sweeting
2e2b4f8150 fix url is too long to be a path error 2020-08-18 08:23:57 -04:00
Nick Sweeting
e3ac4c2405 htmldecode downloaded sources before parsing for links 2020-08-18 08:23:20 -04:00
Cristian
c073ea141d feat: Initial oneshot command proposal 2020-07-29 11:19:06 -05:00
Nick Sweeting
3fe7a9b70c also parse and archive sub-urls in generic_txt input 2020-07-27 18:52:57 -04:00
Cristian
6006b4f93b refactor: Organize code to remove flake8 issues 2020-07-24 12:25:25 -05:00
Cristian
a5550b2105 fix: Rename logging folder to avoid naming conflicts (and circular import issues) 2020-07-22 11:02:13 -05:00
Cristian
f4d1b5121e refactor: Move logging.py to main module to avoid circular import issues 2020-07-17 18:00:04 -05:00
Nick Sweeting
d3bfa98a91 fix depth flag and tweak logging 2020-07-13 11:26:34 -04:00
Nick Sweeting
96b1e4a8ec accept local paths as valid link URLs when parsing 2020-07-13 11:22:58 -04:00
Nick Sweeting
cb67b09f9d Merge branch 'master' into django 2020-06-25 21:30:29 -04:00
Nick Sweeting
204de37eb9 fix parsing errors for older archive index formats 2019-05-01 02:28:48 -04:00
Nick Sweeting
95007d9137 split up utils into separate files 2019-04-30 23:13:04 -04:00
Nick Sweeting
1b8abc0961 move everything out of legacy folder 2019-04-27 17:26:24 -04:00