mirror of
https://github.com/vinta/awesome-python
synced 2024-11-15 08:17:07 +00:00
Improve the concept of web crawling
This commit is contained in:
parent
21ab061232
commit
9b8f935f2c
1 changed files with 3 additions and 3 deletions
|
@ -85,7 +85,7 @@ Inspired by [awesome-php](https://github.com/ziadoz/awesome-php).
|
||||||
- [URL Manipulation](#url-manipulation)
|
- [URL Manipulation](#url-manipulation)
|
||||||
- [Video](#video)
|
- [Video](#video)
|
||||||
- [Web Content Extracting](#web-content-extracting)
|
- [Web Content Extracting](#web-content-extracting)
|
||||||
- [Web Crawling](#web-crawling)
|
- [Web Crawling & Web Scraping](#web-crawling-&-web-scraping)
|
||||||
- [Web Frameworks](#web-frameworks)
|
- [Web Frameworks](#web-frameworks)
|
||||||
- [WebSocket](#websocket)
|
- [WebSocket](#websocket)
|
||||||
- [WSGI Servers](#wsgi-servers)
|
- [WSGI Servers](#wsgi-servers)
|
||||||
|
@ -1203,9 +1203,9 @@ Inspired by [awesome-php](https://github.com/ziadoz/awesome-php).
|
||||||
* [textract](https://github.com/deanmalmgren/textract) - Extract text from any document, Word, PowerPoint, PDFs, etc.
|
* [textract](https://github.com/deanmalmgren/textract) - Extract text from any document, Word, PowerPoint, PDFs, etc.
|
||||||
* [toapi](https://github.com/gaojiuli/toapi) - Every web site provides APIs.
|
* [toapi](https://github.com/gaojiuli/toapi) - Every web site provides APIs.
|
||||||
|
|
||||||
## Web Crawling
|
## Web Crawling & Web Scraping
|
||||||
|
|
||||||
*Libraries for scraping websites.*
|
*Libraries to automate data extraction from websites.*
|
||||||
|
|
||||||
* [cola](https://github.com/chineking/cola) - A distributed crawling framework.
|
* [cola](https://github.com/chineking/cola) - A distributed crawling framework.
|
||||||
* [Demiurge](https://github.com/matiasb/demiurge) - PyQuery-based scraping micro-framework.
|
* [Demiurge](https://github.com/matiasb/demiurge) - PyQuery-based scraping micro-framework.
|
||||||
|
|
Loading…
Reference in a new issue