Bookmarks tagged [web-crawling]
https://github.com/chriskite/anemone
Ruby library and CLI for crawling websites.
- tags: ruby, web-crawling
source code
https://github.com/gottfrois/link_thumbnailer
Ruby gem that generates thumbnail images and videos from a given URL. Much like popular social website with link preview.
- tags: ruby, web-crawling
source code
https://github.com/sparklemotion/mechanize
Mechanize is a ruby library that makes automated web interaction easy.
- tags: ruby, web-crawling
source code
https://github.com/jaimeiniesta/metainspector
Ruby gem for web scraping purposes.
- tags: ruby, web-crawling
source code
https://github.com/propublica/upton
A batteries-included framework for easy web-scraping.
- tags: ruby, web-crawling
source code
https://github.com/felipecsl/wombat
Web scraper with an elegant DSL that parses structured data from web pages.
- tags: ruby, web-crawling
source code
https://github.com/chineking/cola
A distributed crawling framework.
- tags: python, web-crawling, web-scraping
source code
https://pythonhosted.org/feedparser/
Universal feed parser.
- tags: python, web-crawling, web-scraping
https://github.com/lorien/grab
Site scraping framework.
- tags: python, web-crawling, web-scraping
source code
https://github.com/MechanicalSoup/MechanicalSoup
A Python library for automating interaction with websites.
- tags: python, web-crawling, web-scraping
source code
https://github.com/scrapinghub/portia
Visual scraping for Scrapy.
- tags: python, web-crawling, web-scraping
source code
https://github.com/binux/pyspider
A powerful spider system.
- tags: python, web-crawling, web-scraping
source code
https://github.com/jmcarp/robobrowser
A simple, Pythonic library for browsing the web without a standalone web browser.
- tags: python, web-crawling, web-scraping
source code
A fast high-level screen scraping and web crawling framework.
- tags: python, web-crawling, web-scraping
source code
Highly extensible, highly scalable web crawler for production environments.
- tags: java, web-crawling
https://github.com/yasserg/crawler4j
Simple and lightweight web crawler.
- tags: java, web-crawling
source code
Scrapes, parses, manipulates and cleans HTML.
- tags: java, web-crawling
SDK for building low-latency and scalable web crawlers.
- tags: java, web-crawling
https://github.com/code4craft/webmagic
Scalable crawler with downloading, url management, content extraction and persistent.
- tags: java, web-crawling
source code