Data Science

Here is a compilation of data science work I have written in Python and worked over the years on. Please feel free to use, share and add to any tools.

Projects

Machine Learning - Jupyter Notebook Template

I like keeping my jupyter notebook files clean and consistent. I therefore use custom templates for various tasks. Here is my template for machine learning projects.
Notebook includes:
- importing libraries and data
- exploring the dataset: data types, missing values, metrics, visualizations.
- preprocessing: dropping columns, imputing missing values, encoding categorical values, feature scaling.
- splitting train and test data
Packages: NumPy, Pandas, Matplotlib, sklearn

TED Talks and Climate Change

In this notebook I have analyzed TED talks that are related to climate change and environmental issues.
Questions: How did climate change related topics change over time? How do views and likes for different topics differ?
Packages: NumPy, Pandas, re, Matplotlib

Tools

YouTube API to Database Automated Pipeline

Downloading and processing data from the YouTube API and uploading it to a database.
Packages: requests, pandas, time, mysql

API_KEY = "ENTER"
CHANNEL_ID = "ENTER"

# Get video data from YouTube
df = get_videos(df)

# Connect to database
host = "ENTER"
user = "ENTER"
password = "ENTER"
database = "ENTER"
mydb = connect_to_db(host, user, password)

# Create cursor for navigating database
mycursor = mydb.cursor()

# Create table if not yet existing
create_table(mycursor)

# Update existing rows and return new rows as df
new_vid_df = update_db(mycursor, df)

# Appending new rows to table
append_from_df_to_db(mycursor, new_vid_df) 

# Commit all changes
mydb.commit()

Get Twitter Data with Help from Tweepy and Pandas

This program allows you to get data from recent tweets, transform it into a nice pandas DataFrame and write it to your hard drive.
Packages: requests, json, pandas, tweepy

Results in:

<class 'pandas.core.frame.DataFrame'>
Int64Index: 30 entries, 0 to 29
Data columns (total 12 columns):
 #   Column                        Non-Null Count  Dtype 
---  ------                        --------------  ----- 
 0   id                            30 non-null     object
 1   created_at                    30 non-null     object
 2   author_id                     30 non-null     object
 3   lang                          30 non-null     object
 4   text                          30 non-null     object
 5   source                        30 non-null     object
 6   public_metrics.retweet_count  30 non-null     int64 
 7   public_metrics.reply_count    30 non-null     int64 
 8   public_metrics.like_count     30 non-null     int64 
 9   public_metrics.quote_count    30 non-null     int64 
 10  name                          30 non-null     object
 11  username                      30 non-null     object
dtypes: int64(4), object(8)
memory usage: 3.0+ KB

Fetch County Value for Longitude/Latitude with OpenStreetMap API

This program allows you to get the county (or else state, country, country code) for any given longitude and latitude values. Works on big dataframes. In my case I've had 17,000 rows and it took me around two hours for completion.
Packages: requests, pandas, time, mysql, json, functools, tqdm, missingno

# Caching, i.e. using past request results with the same lat and long values, is required by API provider
@cache 
def bar(lat, long):
    url = "https://nominatim.openstreetmap.org/reverse?format=geojson&lat=" +str(lat)+"&lon="+str(long)
    try:
        response = requests.get(url).json()
        response = response["features"][0]["properties"]
        county = response["address"]["county"]
        return county
    except:
        return None # In case the API call returns an error -> return None

def foo(row):
    return bar(row["latitude"], row["longitude"])
    
# Provide progress overview
tqdm.pandas() 
df["county"] = df.progress_apply(foo, axis=1)

Guides

Quick guides for machine learning algorithms.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.idea		.idea
.vscode		.vscode
Public		Public
.DS_Store		.DS_Store
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Science

Projects

Machine Learning - Jupyter Notebook Template

TED Talks and Climate Change

Tools

YouTube API to Database Automated Pipeline

Get Twitter Data with Help from Tweepy and Pandas

Fetch County Value for Longitude/Latitude with OpenStreetMap API

Guides

Supervised algorithms

About

Releases

Packages

Languages

Dince-afk/Data-Science

Folders and files

Latest commit

History

Repository files navigation

Data Science

Projects

Tools

Guides

About

Resources

Stars

Watchers

Forks

Languages