linkding

mirror of https://github.com/sissbruecker/linkding synced 2025-02-18 05:08:28 +00:00

Author	SHA1	Message	Date
Sascha Ißbrücker	4220ea0b4c	Fix website loader content encoding detection (#482 )	2023-05-30 22:04:54 +02:00
Sascha Ißbrücker	30da1880a5	Cache website metadata to avoid duplicate scraping (#401 ) * Cache website metadata to avoid duplicate scraping * fix test setup	2023-01-20 22:28:44 +01:00
Sascha Ißbrücker	43d52642a6	Fix website loader test	2023-01-14 12:26:04 +01:00
Sascha Ißbrücker	4f9170c48d	Improve website loader logging	2023-01-14 11:24:09 +01:00
Luca	c2d8cde86b	Trim website metadata title and description (#383 ) * feat: trim fetched metadata placeholders * feat: implement trimming serverside * Add website loader tests * Address review comments Co-authored-by: Sascha Ißbrücker <sascha.issbruecker@gmail.com>	2023-01-12 21:06:36 +01:00
Sascha Ißbrücker	2fd7704816	Limit document size for website scraper (#354 ) Limits the size of scraped HTML documents to prevent out of memory errors. The scraper will stop reading from the response when it encounters the closing head tag, or if the read content's size exceeds a max limit. Fixes #345	2022-10-07 21:18:18 +02:00
Dustin Blackman	b53bd9f112	Bump waybackpy to 3.0.6 (#281 ) * fix wayback * fix tests * Reuse user agent from website loader Co-authored-by: Sascha Ißbrücker <sascha.issbruecker@gmail.com>	2022-07-03 06:26:16 +02:00
Sascha Ißbrücker	e08bf9fd03	Fake request headers to reduce bot detection (#263 ) Co-authored-by: Sascha Ißbrücker <sascha.issbruecker@gmail.com>	2022-05-21 13:25:32 +02:00
Taku Izumi	937858cf58	Fix website scraper decoding content incorrectly (#126 ) * Avoid stall on web scraping This patch fixes stall on web scraping. I encountered a stall (scraping never ends) when adding a bookmark of some site. To avoid this case, adding a timeout parameter at requests.get() function is a solution. Signed-off-by: Taku Izumi <admin@orz-style.com> * Avoid character corruption of scraping some Japanese sites This patch fixes character corruption of scraping some Japanese sites. To avoid character corruption, I use r.content instead of r.text in load_page function. The reason of character corruption is encoding problem, I think. r.text handles data as unicode encoded text, so if scraping web site's charset is not unicode encoded, character corruption occurs. r.content handles data as str[], we can avoid encoding problem. Signed-off-by: Taku Izumi <admin@orz-style.com> * use charset_normalizer to determine response encoding Co-authored-by: Taku Izumi <admin@orz-style.com> Co-authored-by: Sascha Ißbrücker <sascha.issbruecker@googlemail.com>	2021-08-25 10:16:23 +02:00
Sascha Ißbrücker	e07da529f1	Preview website title + description in bookmark form Fix unnecessary selects when rendering bookmarks	2019-07-02 01:28:02 +02:00

10 commits