Skip to content
This repository was archived by the owner on Dec 20, 2019. It is now read-only.

isearch-gp/gp-flask

Repository files navigation

gp-flask

Google Proxy Flask API using Python, Response and BeautifulSoup

GitHub tag Code Style Linted Known Vulnerabilities Security Scanner Website

I originally tried to "port" Googler to an API but found it much easier to do the web scraping myself. Still need to add a lot of functionality (see ToDo below).

This proxy also displays web and raw web output (for debug)

Usage:

lucky.py - Python Web Scaping API in Flask

        Options:
        -h   --help       this message
        -v N --verbose=N  verbose output

                 0 = Info
                 3 = JSON payload counts
                 5 = JSON payload elements
                 6 = raw JSON payload

Python Dev setup

activate Virtual ENV (venv)/workon hello

C:\Users\x\Documents\GitHub\gp-flask>.\venv\Scripts\activate

C:\Users\x\Documents\GitHub\gp-flask>workon hello

deactivate

(hello) C:\Users\x\Documents\GitHub\gp-flask>deactivate

run the Flask app

(hello) C:\Users\x\Documents\GitHub\gp-flask>python lucky.py

Endpoints to test

Show response as web page (Raw HTML - what Google returns) http://localhost:5000/raw?q=malpractice

Show response as web page (from parsed response data) http://localhost:5000/search?q=malpractice

Send response as JSON (for API) http://localhost:5000/json?q=malpractice

This can also be done interactivaly with Python on the command line:

(hello) C:\Users\x\Documents\GitHub\gp-flask>python

>>> import requests
>>> response = requests.get("http://127.0.0.1:5000/json?q=malpractice")
>>> response.json()

or with cURL:

curl http:///127.0.0.1:5000/json?q=malpractice

Check indent in py files before checkin

python -m tabnanny lucky.py

Advanced Topics (ToDo)

  • CI Testing
  • API Testing
  • Handling Network Errors

Scraper stuff

  • Sessions and Cookies
  • Delays and Backing Off
  • Spoofing and Cycling the User Agent
  • Using Proxy Servers
  • Setting Timeouts
  • Use Selenium web driver
  • Use PhantomJS for headless JS support

Service stuff

  • Authentication
  • Logging

Links

http://timmyreilly.azurewebsites.net/python-pip-virtualenv-installation-on-windows/

https://blog.hartleybrody.com/web-scraping-cheat-sheet/

More here: Iterative Search