Commit graph

171 commits

Author SHA1 Message Date
Nick Sweeting
a680724367
Merge branch 'dev' into search_index_extract_html_text 2023-10-27 23:09:28 -07:00
Ross Williams
310b4d1242 Add htmltotext extractor
Saves HTML text nodes and selected element attributes in
`htmltotext.txt` for each Snapshot. Primarily intended to be used
for search indexing.
2023-10-23 21:42:32 -04:00
Nick Sweeting
63ad43f46c
Merge branch 'dev' into method_allow_deny 2023-10-20 04:25:44 -07:00
DanielBatteryStapler
94dacc49c7
Fix archive_org icon "exists" 2023-08-15 23:49:54 -04:00
Ross Williams
46e80dd509 Rename URL_(WHITE|BLACK)LIST to URL_(ALLOW|DENY)LIST
Retain aliases for old configuration files
2023-08-02 09:31:48 -04:00
Micah R Ledbetter
1e50ca243e Add FAVICON_PROVIDER option for custom favicon service 2023-05-05 20:42:36 -05:00
Nick Sweeting
8ebf3e2f93 add config option PREVIEW_ORIGINALS to hide original iframes in snapshot detail pages 2022-05-09 19:31:41 -07:00
hannah98
fc3d2bb4dc rename TAG_SEPARATORS to TAG_SEPARATOR_PATTERN 2022-01-06 14:14:41 +00:00
hannah98
049f88def9 Added TAG_SEPARATORS option to supply a regex of characters to use when splitting tags 2021-12-30 20:19:48 +00:00
Nick Sweeting
d7f01922f3
fix direct assignment of tags to many-to-many set 2021-12-23 12:29:17 -05:00
Nick Sweeting
b1b7ee2b85
Update sql.py 2021-12-23 12:17:55 -05:00
hannah98
4b8962b60b Fix #725 - correctly parse tags on json import 2021-12-20 08:58:58 -06:00
Nick Sweeting
5a2c78e14b add proper support for URL_WHITELIST instead of using negation regexes 2021-07-06 23:42:00 -04:00
Nick Sweeting
e4974d3536 support negation patterns by checking both re.search and re.match 2021-07-06 23:17:05 -04:00
Nick Sweeting
2c6f0a96bf fix extra arg 2021-04-13 02:21:51 -04:00
Nick Sweeting
50b341baab bail out if old index.json is found during init but doesnt contain links 2021-04-12 16:51:45 -04:00
Nick Sweeting
a9986f1f05 add timezone support, tons of CSS and layout improvements, more detailed snapshot admin form info, ability to sort by recently updated, better grid view styling, better table layouts, better dark mode support 2021-04-10 04:21:36 -04:00
Nick Sweeting
59d5423483 fix snapshot icon caching and ordering 2021-04-01 02:22:15 -04:00
Nick Sweeting
36f0646501
Merge pull request #669 from FliegendeWurst/fix-issue-235
add command: --parser option (fixes #235)
2021-03-31 00:53:47 -04:00
FliegendeWurst
60bd9a902e add command: --parser option 2021-03-28 10:09:11 +02:00
Nick Sweeting
1cabde3ccd remove atomic transactions 2021-02-28 22:54:40 -05:00
Nick Sweeting
46a4197514 fix tests 2021-02-18 04:26:56 -05:00
Nick Sweeting
75e1bfd0a9 create_or_update ArchiveResults from history instead of get_or_create 2021-02-18 02:34:20 -05:00
Nick Sweeting
265bcc0264 fix lint errors2 2021-02-16 16:29:41 -05:00
Nick Sweeting
6f0eec92eb fix lint errors 2021-02-16 16:26:48 -05:00
Nick Sweeting
05e891632c add snapshot_id to Link and uuid to ArchiveResult 2021-02-16 15:54:27 -05:00
Nick Sweeting
8b236b9367 cache dir size, snapshot icons, tags str, and title in django cache 2021-02-16 15:49:29 -05:00
Nick Sweeting
bdf1b102be load ArchiveResults from orphaned links history during init 2021-02-16 06:20:05 -05:00
Nick Sweeting
988a10a9f6 fix warc path in snapshot_icons 2021-02-16 06:18:05 -05:00
Nick Sweeting
82de67db34 fix missing/outdated template variables 2021-02-16 01:23:31 -05:00
Nick Sweeting
33d180afe7 allow filtering snapshots by timestamp in list, update, and remove cmds 2021-02-15 20:48:35 -05:00
Nick Sweeting
78463c243a remove unused GIT_SHA config option 2021-02-15 20:42:33 -05:00
Nick Sweeting
6705354e57 fix assertion 2021-02-08 23:24:48 -05:00
Nick Sweeting
a49884ade8 fix emptystrings in cmd_version causing exception 2021-02-08 23:22:02 -05:00
Nick Sweeting
534ead2440 use the db exclusively for icons instead of hammering filesystem 2021-02-01 02:18:13 -05:00
Nick Sweeting
923f517a8f minor fixes 2021-02-01 02:17:54 -05:00
Nick Sweeting
54c5331693 check for output existance when rendering files icons 2021-01-30 22:04:14 -05:00
Nick Sweeting
15e87353bd only show archive.org if enabled 2021-01-30 22:03:59 -05:00
Nick Sweeting
24e24934f7 add headers.json and fix relative singlefile path resolving for sonic 2021-01-30 21:59:34 -05:00
Nick Sweeting
d6de04a83a fix lgtm errors 2021-01-30 06:07:35 -05:00
Nick Sweeting
cc80ceb0a2 fix icons in public index 2021-01-30 05:47:55 -05:00
Nick Sweeting
1ce0eca217 add trailing slashes to canonical paths 2021-01-30 05:47:55 -05:00
Nick Sweeting
a98298103d cleanup templates and views 2021-01-30 05:47:55 -05:00
Nick Sweeting
f6c3683ab8 fix snapshot favicon loading spinner height 2021-01-29 00:15:32 -05:00
Nick Sweeting
5c54bcc1f3 fix files icons greying out on public index 2021-01-28 22:57:12 -05:00
Nick Sweeting
f0040580c8 fix files icons escaping 2021-01-28 22:27:17 -05:00
Preston Maness
1810426774 Remove now-unused mark_safe import 2021-01-25 21:16:06 -06:00
Preston Maness
b647581115
Update archivebox/index/html.py
mark_safe is dangerous, as the URL's filename could have malicious HTML fragments in it.

Co-authored-by: Nick Sweeting <git@sweeting.me>
2021-01-25 20:47:57 -06:00
Preston Maness
1989275944 Fix issue #617 by using mark_safe in combination with format_html
I have no experience with Django, so all I'm really going off of is this
stackoverflow

https://stackoverflow.com/a/64498319

which cited this bit of Django documentation:

https://docs.djangoproject.com/en/3.1/ref/utils/#django.utils.html.format_html

After using this method, I no longer get the 500 error or KeyError
exception, and can browse the local server and interact with the single
entry in it (the problematic URL in ArchiveBox#617 with curly braces).

Whether this is the "right" method or not, I have no idea. But it is at
least a start.
2021-01-23 20:32:56 -06:00
Tim Gates
7bf63d91ff docs: fix simple typo, timstamp -> timestamp
There is a small typo in archivebox/index/__init__.py.

Should read `timestamp` rather than `timstamp`.
2021-01-06 20:03:40 +02:00