Skip to content

A Python program to statistically analyze a dataset of 10,000 Tweets (from Twitter) for the Computational Social Media course (CS205) at Fulbright University Vietnam.

Notifications You must be signed in to change notification settings

quynhanhninh/cs205-twitter-data-analysis

Repository files navigation

Twitter data analysis - Assignment #2 for Computational Social Media course (Fulbright University Vietnam)

Description

This is a Python program I created to statistically analyze a dataset of 10,000 Tweets (from Twitter) for the Computational Social Media course (CS205) at Fulbright University Vietnam in the Fall term, academic year 2021-2022.

The program uses pandas, numpy, and math libraries on Python alongsides key concepts such as string manipulation, lists, dictionaries, and loops.

Features

  • Open and read the Tweets data from 'twitter_covid_fuv_2021.xlsx' file
  • Compute through the dataset and return the below descriptive statistics as required in the assignment:
    • Percentage of tweets that contain URLs.
    • Percentage of tweets that are (or contain) retweets.
    • Percentage of tweets that contain vaccination hashtags/keywords (%pfizer, %moderna, %astrazeneca, %janssen, %verocell).
    • Distribution of languages declared in the tweet metadata (%EN, %FR,....)
    • Table of the 30 most frequent hashtags in the following format:[rank, hashtag, frequency]. Example: [1, #coronavirus, 2500]
    • Percentage of tweets directly generated by all the 20 media accounts together. e.g.: 3% of tweets were produced by the 20 media accounts altogether.
    • Percentage of tweets directly generated by the 20 NGOs/gov. accounts. e.g.: 5% of tweets were produced by the 20 NGOs/government accounts.
    • Percentage of tweets generated by all the 20 media accounts that appear as retweets.
    • Percentage of tweets generated by all the 20 NGOs/gov. accounts that appear as retweets.

Other information

The Twitter data in the Excel file is provided by our course instructor for the educational purpose of the course only. I am grateful for our instructor, Thay Phan Thanh Trung, as he kindly gave us access to the dataset as well as clear instructions and requirements on this analysis assignment.

#python #university #computerscience #dataanalysis #learning #pythonproject

About

A Python program to statistically analyze a dataset of 10,000 Tweets (from Twitter) for the Computational Social Media course (CS205) at Fulbright University Vietnam.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages