Nick Sweeting
a680724367
Merge branch 'dev' into search_index_extract_html_text
2023-10-27 23:09:28 -07:00
Ross Williams
310b4d1242
Add htmltotext extractor
...
Saves HTML text nodes and selected element attributes in
`htmltotext.txt` for each Snapshot. Primarily intended to be used
for search indexing.
2023-10-23 21:42:32 -04:00
Nick Sweeting
63ad43f46c
Merge branch 'dev' into method_allow_deny
2023-10-20 04:25:44 -07:00
DanielBatteryStapler
94dacc49c7
Fix archive_org icon "exists"
2023-08-15 23:49:54 -04:00
Ross Williams
46e80dd509
Rename URL_(WHITE|BLACK)LIST to URL_(ALLOW|DENY)LIST
...
Retain aliases for old configuration files
2023-08-02 09:31:48 -04:00
Micah R Ledbetter
1e50ca243e
Add FAVICON_PROVIDER option for custom favicon service
2023-05-05 20:42:36 -05:00
Nick Sweeting
8ebf3e2f93
add config option PREVIEW_ORIGINALS to hide original iframes in snapshot detail pages
2022-05-09 19:31:41 -07:00
hannah98
fc3d2bb4dc
rename TAG_SEPARATORS to TAG_SEPARATOR_PATTERN
2022-01-06 14:14:41 +00:00
hannah98
049f88def9
Added TAG_SEPARATORS option to supply a regex of characters to use when splitting tags
2021-12-30 20:19:48 +00:00
Nick Sweeting
d7f01922f3
fix direct assignment of tags to many-to-many set
2021-12-23 12:29:17 -05:00
Nick Sweeting
b1b7ee2b85
Update sql.py
2021-12-23 12:17:55 -05:00
hannah98
4b8962b60b
Fix #725 - correctly parse tags on json import
2021-12-20 08:58:58 -06:00
Nick Sweeting
5a2c78e14b
add proper support for URL_WHITELIST instead of using negation regexes
2021-07-06 23:42:00 -04:00
Nick Sweeting
e4974d3536
support negation patterns by checking both re.search and re.match
2021-07-06 23:17:05 -04:00
Nick Sweeting
2c6f0a96bf
fix extra arg
2021-04-13 02:21:51 -04:00
Nick Sweeting
50b341baab
bail out if old index.json is found during init but doesnt contain links
2021-04-12 16:51:45 -04:00
Nick Sweeting
a9986f1f05
add timezone support, tons of CSS and layout improvements, more detailed snapshot admin form info, ability to sort by recently updated, better grid view styling, better table layouts, better dark mode support
2021-04-10 04:21:36 -04:00
Nick Sweeting
59d5423483
fix snapshot icon caching and ordering
2021-04-01 02:22:15 -04:00
Nick Sweeting
36f0646501
Merge pull request #669 from FliegendeWurst/fix-issue-235
...
add command: --parser option (fixes #235 )
2021-03-31 00:53:47 -04:00
FliegendeWurst
60bd9a902e
add command: --parser option
2021-03-28 10:09:11 +02:00
Nick Sweeting
1cabde3ccd
remove atomic transactions
2021-02-28 22:54:40 -05:00
Nick Sweeting
46a4197514
fix tests
2021-02-18 04:26:56 -05:00
Nick Sweeting
75e1bfd0a9
create_or_update ArchiveResults from history instead of get_or_create
2021-02-18 02:34:20 -05:00
Nick Sweeting
265bcc0264
fix lint errors2
2021-02-16 16:29:41 -05:00
Nick Sweeting
6f0eec92eb
fix lint errors
2021-02-16 16:26:48 -05:00
Nick Sweeting
05e891632c
add snapshot_id to Link and uuid to ArchiveResult
2021-02-16 15:54:27 -05:00
Nick Sweeting
8b236b9367
cache dir size, snapshot icons, tags str, and title in django cache
2021-02-16 15:49:29 -05:00
Nick Sweeting
bdf1b102be
load ArchiveResults from orphaned links history during init
2021-02-16 06:20:05 -05:00
Nick Sweeting
988a10a9f6
fix warc path in snapshot_icons
2021-02-16 06:18:05 -05:00
Nick Sweeting
82de67db34
fix missing/outdated template variables
2021-02-16 01:23:31 -05:00
Nick Sweeting
33d180afe7
allow filtering snapshots by timestamp in list, update, and remove cmds
2021-02-15 20:48:35 -05:00
Nick Sweeting
78463c243a
remove unused GIT_SHA config option
2021-02-15 20:42:33 -05:00
Nick Sweeting
6705354e57
fix assertion
2021-02-08 23:24:48 -05:00
Nick Sweeting
a49884ade8
fix emptystrings in cmd_version causing exception
2021-02-08 23:22:02 -05:00
Nick Sweeting
534ead2440
use the db exclusively for icons instead of hammering filesystem
2021-02-01 02:18:13 -05:00
Nick Sweeting
923f517a8f
minor fixes
2021-02-01 02:17:54 -05:00
Nick Sweeting
54c5331693
check for output existance when rendering files icons
2021-01-30 22:04:14 -05:00
Nick Sweeting
15e87353bd
only show archive.org if enabled
2021-01-30 22:03:59 -05:00
Nick Sweeting
24e24934f7
add headers.json and fix relative singlefile path resolving for sonic
2021-01-30 21:59:34 -05:00
Nick Sweeting
d6de04a83a
fix lgtm errors
2021-01-30 06:07:35 -05:00
Nick Sweeting
cc80ceb0a2
fix icons in public index
2021-01-30 05:47:55 -05:00
Nick Sweeting
1ce0eca217
add trailing slashes to canonical paths
2021-01-30 05:47:55 -05:00
Nick Sweeting
a98298103d
cleanup templates and views
2021-01-30 05:47:55 -05:00
Nick Sweeting
f6c3683ab8
fix snapshot favicon loading spinner height
2021-01-29 00:15:32 -05:00
Nick Sweeting
5c54bcc1f3
fix files icons greying out on public index
2021-01-28 22:57:12 -05:00
Nick Sweeting
f0040580c8
fix files icons escaping
2021-01-28 22:27:17 -05:00
Preston Maness
1810426774
Remove now-unused mark_safe import
2021-01-25 21:16:06 -06:00
Preston Maness
b647581115
Update archivebox/index/html.py
...
mark_safe is dangerous, as the URL's filename could have malicious HTML fragments in it.
Co-authored-by: Nick Sweeting <git@sweeting.me>
2021-01-25 20:47:57 -06:00
Preston Maness
1989275944
Fix issue #617 by using mark_safe in combination with format_html
...
I have no experience with Django, so all I'm really going off of is this
stackoverflow
https://stackoverflow.com/a/64498319
which cited this bit of Django documentation:
https://docs.djangoproject.com/en/3.1/ref/utils/#django.utils.html.format_html
After using this method, I no longer get the 500 error or KeyError
exception, and can browse the local server and interact with the single
entry in it (the problematic URL in ArchiveBox#617 with curly braces).
Whether this is the "right" method or not, I have no idea. But it is at
least a start.
2021-01-23 20:32:56 -06:00
Tim Gates
7bf63d91ff
docs: fix simple typo, timstamp -> timestamp
...
There is a small typo in archivebox/index/__init__.py.
Should read `timestamp` rather than `timstamp`.
2021-01-06 20:03:40 +02:00