longzai
4ae765ec27
fix the URL_REGEX used in generic_html parsers
...
Signed-off-by: longzai <437172242@qq.com>
2024-04-08 04:53:05 +08:00
Nick Sweeting
9d4cc361e6
Update docker-compose.yml
2024-03-27 20:15:27 -07:00
Nick Sweeting
e48159b8a0
cleanup docker-compose by storing crontabs in data dir
2024-03-26 15:24:05 -07:00
Nick Sweeting
ac73fb5129
merge fixes
2024-03-26 15:22:40 -07:00
Nick Sweeting
b4c3aa5097
Merge branch 'main' into dev
2024-03-26 15:01:36 -07:00
Nick Sweeting
a4453b6f87
fix PERSONAS PERSONAS_DIR typo
2024-03-26 14:19:25 -07:00
Nick Sweeting
6981837a0b
Update to Django 4.2.x ( #1388 )
2024-03-26 14:14:42 -07:00
jim winstead
8b1b01e508
Update to Django 4.2.x, now in LTS until April 2026
2024-03-25 17:46:01 -07:00
Nick Sweeting
1d49bee90b
Update README.md
2024-03-21 00:31:48 -07:00
Nick Sweeting
0521379464
Update README.md
2024-03-21 00:29:54 -07:00
Nick Sweeting
ee2809eb4f
Update README.md
2024-03-21 00:27:49 -07:00
Nick Sweeting
88f21d0d70
Update README.md
2024-03-21 00:12:31 -07:00
Nick Sweeting
2c6704b1d0
Update README.md
2024-03-21 00:11:57 -07:00
Nick Sweeting
1dbe08872c
Update README.md
2024-03-21 00:10:19 -07:00
Nick Sweeting
2220a5350c
Update README.md
2024-03-21 00:02:08 -07:00
Nick Sweeting
a1ef5f6035
Update README.md
2024-03-21 00:00:14 -07:00
Nick Sweeting
28e85e0b95
Update README.md
2024-03-20 23:31:04 -07:00
Nick Sweeting
67baea172e
Update README.md
2024-03-20 23:28:02 -07:00
Nick Sweeting
d9beebdee7
Update README.md
2024-03-20 23:25:06 -07:00
Nick Sweeting
d32413d74b
Update README.md
2024-03-20 23:23:26 -07:00
Nick Sweeting
37c9a33c8b
Update README.md
2024-03-20 23:19:23 -07:00
Nick Sweeting
b921efb0e0
Revise md section not formatting properly in html ( #1382 )
2024-03-19 12:25:27 -07:00
Nicholas Hebert
e00845f58c
Revise md section not formatting properly in html
2024-03-19 11:13:47 -03:00
Nick Sweeting
8007e97c3f
point archivebox to novnc display container by default
2024-03-18 14:41:57 -07:00
Nick Sweeting
c0b5dbcecb
create new data/personas dir to hold cookies and chrome profiles
2024-03-18 14:41:39 -07:00
Nick Sweeting
c5bb99dce1
explicitly use Default profile inside user data dir
2024-03-18 14:40:40 -07:00
Nick Sweeting
1fc5d7c5c8
add USER_AGENT config option to set all USER_AGENTs at once
2024-03-18 14:39:09 -07:00
Nick Sweeting
0872c84ba7
Add generic_jsonl parser ( #1370 )
2024-03-14 20:19:21 -07:00
jim winstead
06a0580430
Merge branch 'issue-1369' of github.com:jimwins/ArchiveBox into issue-1369
2024-03-14 15:47:04 -07:00
jim winstead
5478d13d52
Add generic_jsonl parser
...
Resolves #1369
2024-03-14 15:42:29 -07:00
Nick Sweeting
ca2c484a8e
Add _EXTRA_ARGS
for various extractors ( #1360 )
2024-03-14 01:55:09 -07:00
Nick Sweeting
48f4b12ae2
Use COOKIES_FILE
to fetch page titles ( #1364 )
2024-03-14 01:52:44 -07:00
Nick Sweeting
099f7d00fe
Use feedparser for RSS parsing ( #1362 )
...
Fixes #1171
Fixes #870 (probably, would need to test against a Wallabag Atom file to
Fixes #135
Fixes #123
Fixes #106
2024-03-14 01:51:45 -07:00
Nick Sweeting
3512dc7e60
Disable searching for existing chrome user profiles by default
2024-03-14 00:58:45 -07:00
Nick Sweeting
db3fee1845
Update README.md Browser Extension link ( #1374 )
2024-03-07 13:09:20 -08:00
Ricky de Laveaga
86c3e271ad
Update README.md Browser Extension link
...
Point to GH repo with all browsers, not Chrome Webstore
2024-03-07 09:45:41 -08:00
Ben Muthalaly
f4deb97f59
Add ARGS
and EXTRA_ARGS
for Mercury extractor
2024-03-05 21:15:38 -06:00
Ben Muthalaly
d8cf09c21e
Remove unnecessary variable length args for dedupe
2024-03-05 21:13:45 -06:00
Nick Sweeting
1cf0f37a95
Add COOKIES_FILE support for singlefile extractor ( #1372 )
2024-03-05 12:19:24 -08:00
Ben Muthalaly
5082d61613
Merge branch 'title-cookies-file' of https://github.com/benmuth/ArchiveBox into title-cookies-file
2024-03-05 02:03:03 -06:00
Ben Muthalaly
4686da91e6
Fix cookies being set incorrectly
2024-03-05 01:48:35 -06:00
Naomi Phillips
a729480b75
Add COOKIES_FILE support for singlefile extractor
2024-03-03 02:32:46 -05:00
Nick Sweeting
62183b4c85
Make it a little easier to run specific tests ( #1371 )
2024-03-02 05:03:58 -08:00
Ben Muthalaly
d74ddd42ae
Flip dedupe precedence order
2024-03-01 14:50:32 -06:00
jim winstead
741ff5f1a8
Make it a little easier to run specific tests
...
Changes ./bin/test.sh to pass command line options to pytest, and default to
only running tests in the tests/ directory instead of everywhere excluding
a few directories which is more error-prone.
Also keeps the mock_server used in testing quiet so access log entries don't
appear on stdout.
2024-03-01 12:43:53 -08:00
jim winstead
0f402df42f
Merge with latest dev
2024-03-01 12:05:43 -08:00
jim winstead
e7119adb0b
Add tests for generic_rss and pinboard_rss parsers
2024-03-01 11:27:59 -08:00
jim winstead
9f462a87a8
Use feedparser for RSS parsing in generic_rss and pinboard_rss parsers
...
The feedparser packages has 20 years of history and is very good at parsing
RSS and Atom, so use that instead of ad-hoc regex and XML parsing.
The medium_rss and shaarli_rss parsers weren't touched because they are
probably unnecessary. (The special parse for pinboard is just needing because
of how tags work.)
Doesn't include tests because I haven't figured out how to run them in the
docker development setup.
Fixes #1171
2024-03-01 11:25:45 -08:00
jim winstead
1f828d9441
Add tests for generic_rss and pinboard_rss parsers
2024-03-01 11:22:28 -08:00
Nick Sweeting
a577d1ed23
Merge branch 'dev' into title-cookies-file
2024-02-29 21:29:36 -08:00