linkding/bookmarks/services
Taku Izumi 937858cf58
Fix website scraper decoding content incorrectly (#126)
* Avoid stall on web scraping

This patch fixes stall on web scraping.
I encountered a stall (scraping never ends) when adding
a bookmark of some site.
To avoid this case, adding a timeout parameter at requests.get()
function is a solution.

Signed-off-by: Taku Izumi <admin@orz-style.com>

* Avoid character corruption of scraping some Japanese sites

This patch fixes character corruption of scraping some Japanese
sites. To avoid character corruption, I use r.content instead
of r.text in load_page function.

The reason of character corruption is encoding problem, I think.
r.text handles data as unicode encoded text, so if scraping
web site's charset is not unicode encoded, character corruption
occurs. r.content handles data as str[], we can avoid encoding
problem.

Signed-off-by: Taku Izumi <admin@orz-style.com>

* use charset_normalizer to determine response encoding

Co-authored-by: Taku Izumi <admin@orz-style.com>
Co-authored-by: Sascha Ißbrücker <sascha.issbruecker@googlemail.com>
2021-08-25 10:16:23 +02:00
..
__init__.py Implement add bookmark route 2019-06-28 19:37:41 +02:00
bookmarks.py Implement bulk edit (#101) 2021-03-29 00:43:50 +02:00
exporter.py Implement bookmark export 2019-12-26 13:45:12 +01:00
importer.py Fix importer not validating bookmark models (#149) 2021-08-25 09:20:01 +02:00
parser.py Add settings view tests 2021-05-14 23:34:53 +02:00
tags.py Fix duplicate tag error (#65) 2021-01-12 22:42:56 +01:00
website_loader.py Fix website scraper decoding content incorrectly (#126) 2021-08-25 10:16:23 +02:00