Automated Insights Data Science Intern Evaluation

Project Status: [Completed]

Project Intro/Objective

Dataset consists of about 42,000 movie summaries scraped from Wikipedia. Following given guidelines, I use natural language processing (NLP) tools with R and conduct univariate and multi-variate exploration of the dataset, summarized below:

find the most produced and the most profitable movie genres
identify common characteristics in movies summaries
compare the word usage in the top five produced genres with Zipf’s law

Methods Used

Text mining packages: tm, tidytext;
Data visualization: ggplot2;
Natural language Processing library: spaCy (wrapped in cleanNLP), topicmodels;
Data manipulation: tidyverse;

Tools

R version 3.5.1
Python3 (backend)

Warning

The RDS files which are saved and loaded the coding file may not work due to the limitation in push size of GitHub.

Featured Deliverables

Html: Copy the summary link into this github html reader
PDF (fail to display latex equation): Project Summary

Contact

Xiaoxuan Yang: [xy77@duke.edu]

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
rds_data		rds_data
.here		.here
AI_Internship_R_Code.Rmd		AI_Internship_R_Code.Rmd
Project Summary.nb.html		Project Summary.nb.html
Project_Summary.html		Project_Summary.html
Project_Summary_XY.pdf		Project_Summary_XY.pdf
Project_Summary_code.Rmd		Project_Summary_code.Rmd
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automated Insights Data Science Intern Evaluation

Project Status: [Completed]

Project Intro/Objective

Methods Used

Tools

Warning

Featured Deliverables

Contact

About

Releases

Packages

Languages

xjessiex/AI_Internship_Evaluation

Folders and files

Latest commit

History

Repository files navigation

Automated Insights Data Science Intern Evaluation

Project Status: [Completed]

Project Intro/Objective

Methods Used

Tools

Warning

Featured Deliverables

Contact

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages