Skip to content

Latest commit

 

History

History
133 lines (113 loc) · 6.08 KB

web-crawling.md

File metadata and controls

133 lines (113 loc) · 6.08 KB

Bookmarks tagged [web-crawling]

https://github.com/chriskite/anemone

Ruby library and CLI for crawling websites.


https://github.com/gottfrois/link_thumbnailer

Ruby gem that generates thumbnail images and videos from a given URL. Much like popular social website with link preview.


https://github.com/sparklemotion/mechanize

Mechanize is a ruby library that makes automated web interaction easy.


https://github.com/jaimeiniesta/metainspector

Ruby gem for web scraping purposes.


https://github.com/propublica/upton

A batteries-included framework for easy web-scraping.


https://github.com/felipecsl/wombat

Web scraper with an elegant DSL that parses structured data from web pages.


https://github.com/chineking/cola

A distributed crawling framework.


https://pythonhosted.org/feedparser/

Universal feed parser.


https://github.com/lorien/grab

Site scraping framework.


https://github.com/MechanicalSoup/MechanicalSoup

A Python library for automating interaction with websites.


https://github.com/scrapinghub/portia

Visual scraping for Scrapy.


https://github.com/binux/pyspider

A powerful spider system.


https://github.com/jmcarp/robobrowser

A simple, Pythonic library for browsing the web without a standalone web browser.


https://scrapy.org/

A fast high-level screen scraping and web crawling framework.


https://nutch.apache.org

Highly extensible, highly scalable web crawler for production environments.


https://github.com/yasserg/crawler4j

Simple and lightweight web crawler.


https://jsoup.org

Scrapes, parses, manipulates and cleans HTML.


http://stormcrawler.net

SDK for building low-latency and scalable web crawlers.


https://github.com/code4craft/webmagic

Scalable crawler with downloading, url management, content extraction and persistent.