Commit graph

3802 commits

Author SHA1 Message Date
longzai
4ae765ec27 fix the URL_REGEX used in generic_html parsers
Signed-off-by: longzai <437172242@qq.com>
2024-04-08 04:53:05 +08:00
Nick Sweeting
9d4cc361e6
Update docker-compose.yml 2024-03-27 20:15:27 -07:00
Nick Sweeting
e48159b8a0 cleanup docker-compose by storing crontabs in data dir 2024-03-26 15:24:05 -07:00
Nick Sweeting
ac73fb5129 merge fixes 2024-03-26 15:22:40 -07:00
Nick Sweeting
b4c3aa5097 Merge branch 'main' into dev 2024-03-26 15:01:36 -07:00
Nick Sweeting
a4453b6f87
fix PERSONAS PERSONAS_DIR typo 2024-03-26 14:19:25 -07:00
Nick Sweeting
6981837a0b
Update to Django 4.2.x (#1388) 2024-03-26 14:14:42 -07:00
jim winstead
8b1b01e508 Update to Django 4.2.x, now in LTS until April 2026 2024-03-25 17:46:01 -07:00
Nick Sweeting
1d49bee90b
Update README.md 2024-03-21 00:31:48 -07:00
Nick Sweeting
0521379464
Update README.md 2024-03-21 00:29:54 -07:00
Nick Sweeting
ee2809eb4f
Update README.md 2024-03-21 00:27:49 -07:00
Nick Sweeting
88f21d0d70
Update README.md 2024-03-21 00:12:31 -07:00
Nick Sweeting
2c6704b1d0
Update README.md 2024-03-21 00:11:57 -07:00
Nick Sweeting
1dbe08872c
Update README.md 2024-03-21 00:10:19 -07:00
Nick Sweeting
2220a5350c
Update README.md 2024-03-21 00:02:08 -07:00
Nick Sweeting
a1ef5f6035
Update README.md 2024-03-21 00:00:14 -07:00
Nick Sweeting
28e85e0b95
Update README.md 2024-03-20 23:31:04 -07:00
Nick Sweeting
67baea172e
Update README.md 2024-03-20 23:28:02 -07:00
Nick Sweeting
d9beebdee7
Update README.md 2024-03-20 23:25:06 -07:00
Nick Sweeting
d32413d74b
Update README.md 2024-03-20 23:23:26 -07:00
Nick Sweeting
37c9a33c8b
Update README.md 2024-03-20 23:19:23 -07:00
Nick Sweeting
b921efb0e0
Revise md section not formatting properly in html (#1382) 2024-03-19 12:25:27 -07:00
Nicholas Hebert
e00845f58c
Revise md section not formatting properly in html 2024-03-19 11:13:47 -03:00
Nick Sweeting
8007e97c3f point archivebox to novnc display container by default 2024-03-18 14:41:57 -07:00
Nick Sweeting
c0b5dbcecb create new data/personas dir to hold cookies and chrome profiles 2024-03-18 14:41:39 -07:00
Nick Sweeting
c5bb99dce1 explicitly use Default profile inside user data dir 2024-03-18 14:40:40 -07:00
Nick Sweeting
1fc5d7c5c8 add USER_AGENT config option to set all USER_AGENTs at once 2024-03-18 14:39:09 -07:00
Nick Sweeting
0872c84ba7
Add generic_jsonl parser (#1370) 2024-03-14 20:19:21 -07:00
jim winstead
06a0580430 Merge branch 'issue-1369' of github.com:jimwins/ArchiveBox into issue-1369 2024-03-14 15:47:04 -07:00
jim winstead
5478d13d52 Add generic_jsonl parser
Resolves #1369
2024-03-14 15:42:29 -07:00
Nick Sweeting
ca2c484a8e
Add _EXTRA_ARGS for various extractors (#1360) 2024-03-14 01:55:09 -07:00
Nick Sweeting
48f4b12ae2
Use COOKIES_FILE to fetch page titles (#1364) 2024-03-14 01:52:44 -07:00
Nick Sweeting
099f7d00fe
Use feedparser for RSS parsing (#1362)
Fixes #1171
Fixes #870 (probably, would need to test against a Wallabag Atom file to
Fixes #135
Fixes #123
Fixes #106
2024-03-14 01:51:45 -07:00
Nick Sweeting
3512dc7e60
Disable searching for existing chrome user profiles by default 2024-03-14 00:58:45 -07:00
Nick Sweeting
db3fee1845
Update README.md Browser Extension link (#1374) 2024-03-07 13:09:20 -08:00
Ricky de Laveaga
86c3e271ad
Update README.md Browser Extension link
Point to GH repo with all browsers, not Chrome Webstore
2024-03-07 09:45:41 -08:00
Ben Muthalaly
f4deb97f59 Add ARGS and EXTRA_ARGS for Mercury extractor 2024-03-05 21:15:38 -06:00
Ben Muthalaly
d8cf09c21e Remove unnecessary variable length args for dedupe 2024-03-05 21:13:45 -06:00
Nick Sweeting
1cf0f37a95
Add COOKIES_FILE support for singlefile extractor (#1372) 2024-03-05 12:19:24 -08:00
Ben Muthalaly
5082d61613 Merge branch 'title-cookies-file' of https://github.com/benmuth/ArchiveBox into title-cookies-file 2024-03-05 02:03:03 -06:00
Ben Muthalaly
4686da91e6 Fix cookies being set incorrectly 2024-03-05 01:48:35 -06:00
Naomi Phillips
a729480b75
Add COOKIES_FILE support for singlefile extractor 2024-03-03 02:32:46 -05:00
Nick Sweeting
62183b4c85
Make it a little easier to run specific tests (#1371) 2024-03-02 05:03:58 -08:00
Ben Muthalaly
d74ddd42ae Flip dedupe precedence order 2024-03-01 14:50:32 -06:00
jim winstead
741ff5f1a8 Make it a little easier to run specific tests
Changes ./bin/test.sh to pass command line options to pytest, and default to
only running tests in the tests/ directory instead of everywhere excluding
a few directories which is more error-prone.

Also keeps the mock_server used in testing quiet so access log entries don't
appear on stdout.
2024-03-01 12:43:53 -08:00
jim winstead
0f402df42f Merge with latest dev 2024-03-01 12:05:43 -08:00
jim winstead
e7119adb0b Add tests for generic_rss and pinboard_rss parsers 2024-03-01 11:27:59 -08:00
jim winstead
9f462a87a8 Use feedparser for RSS parsing in generic_rss and pinboard_rss parsers
The feedparser packages has 20 years of history and is very good at parsing
RSS and Atom, so use that instead of ad-hoc regex and XML parsing.

The medium_rss and shaarli_rss parsers weren't touched because they are
probably unnecessary. (The special parse for pinboard is just needing because
of how tags work.)

Doesn't include tests because I haven't figured out how to run them in the
docker development setup.

Fixes #1171
2024-03-01 11:25:45 -08:00
jim winstead
1f828d9441 Add tests for generic_rss and pinboard_rss parsers 2024-03-01 11:22:28 -08:00
Nick Sweeting
a577d1ed23
Merge branch 'dev' into title-cookies-file 2024-02-29 21:29:36 -08:00