This is where I host some of my data analytics projects.
This Prediction Challenge is modelled after the 2015 Facebook Recruitment Challenge, a prediction competition jointly held by Facebook and Kaggle.
In this competition, we are chasing down robots for an online auction site. Human bidders on the site are becoming increasingly frustrated with their inability to win auctions vs. their software-controlled counterparts. As a result, usage from the site's core customer base is plummeting.
In order to rebuild customer happiness, the site owners need to eliminate computer generated bidding from their auctions. Their attempt at building a model to identify these bids using behavioral data, including bid frequency over short periods of time, has proven insufficient.
The goal of this competition is to identify online auction bids that are placed by "robots", helping the site owners easily flag these users for removal from their site to prevent unfair auction activity.
Through this project, I implemented the following:
- More than 70 features created via feature engineering
- Hyperparameters tuning for Random Forest model
- Achieved a public score of 0.90561 and private score of 0.89190.
In this project, I was inspired by Loh Kean Yew's historic achievement of being the first Singaporean to win a BWF World Championships title at the BWF World Championships 2021 held in Huelva, Spain, I wanted to do some simple analysis on his meteoric rise to the top of the badminton world.
To start off, I felt that the BWF official website offers some match data that could be useful. Since I have just learnt about web scrapping, I decided to code and web scrape Loh Kean Yew's playing data from the BWF website.
Through this project, I implemented the following:
- Web scrape using BeautifulSoup and Selenium
- Extract data for chosen athelete (Loh Kean Yew)
- Export data to a csv file
In this project, I was inspired by Loh Kean Yew's historic achievement of being the first Singaporean to win a BWF World Championships title at the BWF World Championships 2021 held in Huelva, Spain, I wanted to do some simple analysis on his meteoric rise to the top of the badminton world.
Building off the data web scrapped from BWF website, I did a simple analysis on Loh Kean Yew's performance.
Through this project, I implemented the following:
- Data wrangling
- Data visualisation
This project will analyse data to help our developers understand what type of apps are likely to attract more users so as to increase of main source of revenue through in-app ads.
Through this project, I implemented the following:
- Opening and exploring data
- Cleaning data, such as removing wrong, duplicate, and irrelevant data
- Choosing the appropriate data
- Analysing data based on chosen parameters
Hacker News is a site started by the startup incubator Y Combinator, where user-submitted stories (known as "posts") receive votes and comments, similar to reddit. Hacker News is extremely popular in technology and startup circles, and posts that make it to the top of the Hacker News listings can get hundreds of thousands of visitors as a result.
For this project, we're specifically interested in posts with titles that begin with either Ask HN or Show HN. Users submit Ask HN posts to ask the Hacker News community a specific question or Show HN posts to show the Hacker News community a project, product, or just something interesting.
We'll compare these two types of posts to determine the following: (!) do Ask HN or Show HN receive more comments on average; and (2) do posts created at a certain time receive more comments on average?
Through this project, I implemented the following:
- Extracting the chosen data for analysis
- Calculating average for chosen parameters
- Sorting and printing values from a list of lists
The aim of this project is to clean the data and analyze the included used car listings. We'll work with a dataset of used cars from eBay Kleinanzeigen, a classifieds section of the German eBay website.
Through this project, I implemented the following:
- Data cleaning and exploration using pandas
We're going to analyze a dataset about the westbound traffic on the I-94 Interstate highway.
The goal of our analysis is to determine a few indicators of heavy traffic on I-94. These indicators can be weather type, time of the day, time of the week, etc. For instance, we may find out that the traffic is usually heavier in the summer or when it snows.
Through this project, I implemented the following:
- Data cleaning and exploration using pandas
- Conduct exploratory data visualisation using matplotlib and seaborn
The dataset we will use describes Euro daily exchange rates between 1999 and 2021. We will use this dataset to practise explanatory data visualisation.
Through this project, I implemented the following:
- Data cleaning and exploration using pandas
- Conduct explanatory data visualisation using matplotlib
In this project, we played the role of data analyst and pretend our stakeholders want to know the following:
- Are employees who only worked for the institutes for a short period of time resigning due to some kind of dissatisfaction? What about employees who have been there longer?
- Are younger employees resigning due to some kind of dissatisfaction? What about older employees?
Through this project, I implemented the following:
- Data cleaning and exploration using pandas
In this project, the team collected data to address question if "the rest of America realize that “The Empire Strikes Back” is clearly the best of the bunch". To do this, they surveyed Star Wars fans using the online tool SurveyMonkey.
Through this project, I implemented the following:
- Data cleaning and exploration using pandas
Simple project to demonstrate basic SQL.
Simple project to demonstrate intermdiate SQL.
In this project, we imagine we working for a company that creates data science content, be it books, online articles, videos or interactive text-based platforms. We are tasked with figuring out what is best content to write about.
To begin, we referenced website like Stack Exchange. We will focus on Data Science Stack Exchange (DSSE) as it is a data science dedicated site (contrarily to the others), coupled with it having a lot of unanswered questions.
Through this project, I implemented the following:
- Understanding the useful and relevant information on the website that could aid in analysis
- Data transformation and exploration using pandas
- Conduct explanatory data visualisation using matplotlib
In this project, we will analyze more recent movie ratings data to determine whether there has been any change in Fandango's rating system after Hickey's analysis.
In October 2015, a data journalist named Walt Hickey analyzed movie ratings data and found strong evidence to suggest that Fandango's rating system was biased and dishonest.
Through this project, I implemented the following:
- Data transformation and exploration using pandas
- Conduct explanatory data visualisation using matplotlib
- Analysing the data with basic statistical methods
In this project, we assume that we are working for an an e-learning company that offers courses on programming. Most of our courses are on web and mobile development, but we also cover many other domains, like data science, game development, etc. We want to promote our product and we had like to invest some money in advertisement. Our goal in this project is to find out the two best markets to advertise our product in.
Through this project, I implemented the following:
- Data transformation and exploration using pandas
- Conduct explanatory data visualisation using matplotlib/seaborn
- Determining and removing outliers data
- Make use of basic statistics for decisions
In this project, we practised a basic machine learning workflow and make use of the k-nearest neighbors algorithm to predict a car's market price using its attributes.
Through this project, I implemented the following:
- Using a Univariate Model
- Using a Multivariate Model
- Hyperparameter Tuning
In this project, we explored ways to improve the machine learning models using housing data for the city of Ames, Iowa, United States from 2006 to 2010.
Through this project, I implemented the following:
- Basic feature engineering and transformation of data
- Simple linear regression model
In this project, we tried to predict the total number of bikes people rented in a given hour.
Through this project, I implemented the following:
- Analysing the Data
- Simple Feature Engineering
- Build and compare various models