linkding/bookmarks
Taku Izumi 937858cf58
Fix website scraper decoding content incorrectly (#126)
* Avoid stall on web scraping

This patch fixes stall on web scraping.
I encountered a stall (scraping never ends) when adding
a bookmark of some site.
To avoid this case, adding a timeout parameter at requests.get()
function is a solution.

Signed-off-by: Taku Izumi <admin@orz-style.com>

* Avoid character corruption of scraping some Japanese sites

This patch fixes character corruption of scraping some Japanese
sites. To avoid character corruption, I use r.content instead
of r.text in load_page function.

The reason of character corruption is encoding problem, I think.
r.text handles data as unicode encoded text, so if scraping
web site's charset is not unicode encoded, character corruption
occurs. r.content handles data as str[], we can avoid encoding
problem.

Signed-off-by: Taku Izumi <admin@orz-style.com>

* use charset_normalizer to determine response encoding

Co-authored-by: Taku Izumi <admin@orz-style.com>
Co-authored-by: Sascha Ißbrücker <sascha.issbruecker@googlemail.com>
2021-08-25 10:16:23 +02:00
..
api Mark optional fields in bookmark serializer (#78) 2021-02-18 22:02:45 +01:00
components Implement bulk edit (#101) 2021-03-29 00:43:50 +02:00
management/commands Create docker image 2019-07-03 17:18:29 +02:00
migrations Display date_added in bookmark list (#85) 2021-03-31 09:08:19 +02:00
services Fix website scraper decoding content incorrectly (#126) 2021-08-25 10:16:23 +02:00
static Implement bulk edit (#101) 2021-03-29 00:43:50 +02:00
styles Allow editing of scraped values (#80) 2021-04-04 10:16:40 +02:00
templates Add about section in settings (#134) 2021-08-24 19:47:58 +02:00
templatetags Display date_added in bookmark list (#85) 2021-03-31 09:08:19 +02:00
tests Fix importer not validating bookmark models (#149) 2021-08-25 09:20:01 +02:00
views Add about section in settings (#134) 2021-08-24 19:47:58 +02:00
__init__.py Implement basic bookmark page 2019-06-27 08:09:51 +02:00
admin.py Upgrade Django major (#144) 2021-08-17 05:48:45 +02:00
apps.py Implement basic bookmark page 2019-06-27 08:09:51 +02:00
models.py Display date_added in bookmark list (#85) 2021-03-31 09:08:19 +02:00
queries.py improve tag query performance (#142) 2021-08-15 09:28:40 +02:00
urls.py Implement bulk edit (#101) 2021-03-29 00:43:50 +02:00
utils.py Fix relative date formatting (#107) 2021-04-06 23:38:15 +02:00
validators.py Add option to disable bookmark URL validation (#57) 2021-02-06 16:27:19 +01:00