Andy's Portfolio

This is where I host some of my data analytics projects.

Section 1: Competition/Self-Initiated Projects

AI200 Capstone Project: Human or Robot?

This Prediction Challenge is modelled after the 2015 Facebook Recruitment Challenge, a prediction competition jointly held by Facebook and Kaggle.

In this competition, we are chasing down robots for an online auction site. Human bidders on the site are becoming increasingly frustrated with their inability to win auctions vs. their software-controlled counterparts. As a result, usage from the site's core customer base is plummeting.

In order to rebuild customer happiness, the site owners need to eliminate computer generated bidding from their auctions. Their attempt at building a model to identify these bids using behavioral data, including bid frequency over short periods of time, has proven insufficient.

The goal of this competition is to identify online auction bids that are placed by "robots", helping the site owners easily flag these users for removal from their site to prevent unfair auction activity.

Through this project, I implemented the following:

More than 70 features created via feature engineering
Hyperparameters tuning for Random Forest model
Achieved a public score of 0.90561 and private score of 0.89190.

Project Web Scrape: From Badminton World Federation Website

In this project, I was inspired by Loh Kean Yew's historic achievement of being the first Singaporean to win a BWF World Championships title at the BWF World Championships 2021 held in Huelva, Spain, I wanted to do some simple analysis on his meteoric rise to the top of the badminton world.

To start off, I felt that the BWF official website offers some match data that could be useful. Since I have just learnt about web scrapping, I decided to code and web scrape Loh Kean Yew's playing data from the BWF website.

Through this project, I implemented the following:

Web scrape using BeautifulSoup and Selenium
Extract data for chosen athelete (Loh Kean Yew)
Export data to a csv file

Project Loh Kean Yew

In this project, I was inspired by Loh Kean Yew's historic achievement of being the first Singaporean to win a BWF World Championships title at the BWF World Championships 2021 held in Huelva, Spain, I wanted to do some simple analysis on his meteoric rise to the top of the badminton world.

Building off the data web scrapped from BWF website, I did a simple analysis on Loh Kean Yew's performance.

Through this project, I implemented the following:

Data wrangling
Data visualisation

Section 2: My learning journey through Dataquest.

Project 1: Mobile Apps Analysis

This project will analyse data to help our developers understand what type of apps are likely to attract more users so as to increase of main source of revenue through in-app ads.

Through this project, I implemented the following:

Opening and exploring data
Cleaning data, such as removing wrong, duplicate, and irrelevant data
Choosing the appropriate data
Analysing data based on chosen parameters

Project 2: Hacker News Analysis

Hacker News is a site started by the startup incubator Y Combinator, where user-submitted stories (known as "posts") receive votes and comments, similar to reddit. Hacker News is extremely popular in technology and startup circles, and posts that make it to the top of the Hacker News listings can get hundreds of thousands of visitors as a result.

For this project, we're specifically interested in posts with titles that begin with either Ask HN or Show HN. Users submit Ask HN posts to ask the Hacker News community a specific question or Show HN posts to show the Hacker News community a project, product, or just something interesting.

We'll compare these two types of posts to determine the following: (!) do Ask HN or Show HN receive more comments on average; and (2) do posts created at a certain time receive more comments on average?

Through this project, I implemented the following:

Extracting the chosen data for analysis
Calculating average for chosen parameters
Sorting and printing values from a list of lists

Project 3: eBay Car Sales Analysis

The aim of this project is to clean the data and analyze the included used car listings. We'll work with a dataset of used cars from eBay Kleinanzeigen, a classifieds section of the German eBay website.

Through this project, I implemented the following:

Data cleaning and exploration using pandas

Project 4: Heavy Traffic Analysis

We're going to analyze a dataset about the westbound traffic on the I-94 Interstate highway.

The goal of our analysis is to determine a few indicators of heavy traffic on I-94. These indicators can be weather type, time of the day, time of the week, etc. For instance, we may find out that the traffic is usually heavier in the summer or when it snows.

Through this project, I implemented the following:

Data cleaning and exploration using pandas
Conduct exploratory data visualisation using matplotlib and seaborn

Project 5: Euro Exchange Rate Analysis

The dataset we will use describes Euro daily exchange rates between 1999 and 2021. We will use this dataset to practise explanatory data visualisation.

Through this project, I implemented the following:

Data cleaning and exploration using pandas
Conduct explanatory data visualisation using matplotlib

Project 6: Clean and Analyse Employee Exit Surveys

In this project, we played the role of data analyst and pretend our stakeholders want to know the following:

Are employees who only worked for the institutes for a short period of time resigning due to some kind of dissatisfaction? What about employees who have been there longer?
Are younger employees resigning due to some kind of dissatisfaction? What about older employees?

Through this project, I implemented the following:

Data cleaning and exploration using pandas

Project 7: Analysing Star Wars Survey Results

In this project, the team collected data to address question if "the rest of America realize that “The Empire Strikes Back” is clearly the best of the bunch". To do this, they surveyed Star Wars fans using the online tool SurveyMonkey.

Through this project, I implemented the following:

Data cleaning and exploration using pandas

Project 8: Basic SQL

Simple project to demonstrate basic SQL.

Project 9: Intermdiate SQL

Simple project to demonstrate intermdiate SQL.

Project 10: Basic Data Analysis in Business

In this project, we imagine we working for a company that creates data science content, be it books, online articles, videos or interactive text-based platforms. We are tasked with figuring out what is best content to write about.

To begin, we referenced website like Stack Exchange. We will focus on Data Science Stack Exchange (DSSE) as it is a data science dedicated site (contrarily to the others), coupled with it having a lot of unanswered questions.

Through this project, I implemented the following:

Understanding the useful and relevant information on the website that could aid in analysis
Data transformation and exploration using pandas
Conduct explanatory data visualisation using matplotlib

Project 11: Investigating Fandango Movie Ratings

In this project, we will analyze more recent movie ratings data to determine whether there has been any change in Fandango's rating system after Hickey's analysis.

In October 2015, a data journalist named Walt Hickey analyzed movie ratings data and found strong evidence to suggest that Fandango's rating system was biased and dishonest.

Through this project, I implemented the following:

Data transformation and exploration using pandas
Conduct explanatory data visualisation using matplotlib
Analysing the data with basic statistical methods

Project 12: Finding the Best Markets to Advertise In

In this project, we assume that we are working for an an e-learning company that offers courses on programming. Most of our courses are on web and mobile development, but we also cover many other domains, like data science, game development, etc. We want to promote our product and we had like to invest some money in advertisement. Our goal in this project is to find out the two best markets to advertise our product in.

Through this project, I implemented the following:

Data transformation and exploration using pandas
Conduct explanatory data visualisation using matplotlib/seaborn
Determining and removing outliers data
Make use of basic statistics for decisions

Project 13: Predicting Car Prices

In this project, we practised a basic machine learning workflow and make use of the k-nearest neighbors algorithm to predict a car's market price using its attributes.

Through this project, I implemented the following:

Using a Univariate Model
Using a Multivariate Model
Hyperparameter Tuning

Project 14: Predicting House Sales

In this project, we explored ways to improve the machine learning models using housing data for the city of Ames, Iowa, United States from 2006 to 2010.

Through this project, I implemented the following:

Basic feature engineering and transformation of data
Simple linear regression model

Project 15: Predicting Bike Rentals

In this project, we tried to predict the total number of bikes people rented in a given hour.

Through this project, I implemented the following:

Analysing the Data
Simple Feature Engineering
Build and compare various models

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Andy's Portfolio

Section 1: Competition/Self-Initiated Projects

AI200 Capstone Project: Human or Robot?

Project Web Scrape: From Badminton World Federation Website

Project Loh Kean Yew

Section 2: My learning journey through Dataquest.

Project 1: Mobile Apps Analysis

Project 2: Hacker News Analysis

Project 3: eBay Car Sales Analysis

Project 4: Heavy Traffic Analysis

Project 5: Euro Exchange Rate Analysis

Project 6: Clean and Analyse Employee Exit Surveys

Project 7: Analysing Star Wars Survey Results

Project 8: Basic SQL

Project 9: Intermdiate SQL

Project 10: Basic Data Analysis in Business

Project 11: Investigating Fandango Movie Ratings

Project 12: Finding the Best Markets to Advertise In

Project 13: Predicting Car Prices

Project 14: Predicting House Sales

Project 15: Predicting Bike Rentals

About

Releases

Packages

andyphua114/portfolio

Folders and files

Latest commit

History

Repository files navigation

Andy's Portfolio

Section 1: Competition/Self-Initiated Projects

Section 2: My learning journey through Dataquest.

About

Resources

Stars

Watchers

Forks