Skip to content

elfgk/Titanic-Data-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Titanic Data Analysis

This project focuses on analyzing the Titanic dataset, which includes information about passengers aboard the RMS Titanic. The goal is to explore the data and build a machine learning model to predict passenger survival based on features such as age, class, gender, and ticket information.

Dataset: https://www.kaggle.com/competitions/titanic

Project Overview

The project involves the following steps:

  1. Data Exploration:

    • The Titanic dataset is explored to understand the features and the relationships between them. Basic data cleaning and preprocessing are done at this stage.
  2. Data Preprocessing:

    • The dataset is cleaned by handling missing values, encoding categorical variables, and scaling features to prepare it for machine learning.
  3. Model Building:

    • A machine learning model (e.g., Logistic Regression, Decision Trees, Random Forest) is built to predict the survival of passengers.
  4. Model Evaluation:

    • The performance of the model is evaluated using metrics such as accuracy, precision, recall, and F1 score. Cross-validation and hyperparameter tuning are also performed to optimize the model's performance.
  5. Visualization:

    • Various visualizations are created using libraries like matplotlib and seaborn to better understand the dataset and the relationships between features.

Dataset

The dataset used in this project is the Titanic dataset from Kaggle, which contains the following columns:

  • PassengerId: Unique ID for each passenger.
  • Pclass: Passenger class (1st, 2nd, or 3rd).
  • Name: Name of the passenger.
  • Sex: Gender of the passenger.
  • Age: Age of the passenger.
  • SibSp: Number of siblings or spouses aboard the Titanic.
  • Parch: Number of parents or children aboard the Titanic.
  • Ticket: Ticket number.
  • Fare: Fare paid by the passenger.
  • Cabin: Cabin number.
  • Embarked: Port of embarkation (C = Cherbourg; Q = Queenstown; S = Southampton).
  • Survived: Survival status (0 = No, 1 = Yes).

Libraries Used

  • pandas: For data manipulation and analysis.
  • numpy: For numerical operations.
  • matplotlib and seaborn: For data visualization.
  • scikit-learn: For building and evaluating machine learning models.
  • xgboost (optional): For boosting models and improving prediction accuracy.

Getting Started

To get started with this project, follow these steps:

  1. Clone or download the repository:

    git clone https://github.com/elfgk/Titanic-Data-Analysis.git
  2. Install the required Python libraries.

  3. Open the titanic_data_analysis.ipynb Jupyter notebook and follow the steps for data exploration, preprocessing, model building, and evaluation.

𓍢ִ໋☕️✧˚ ༘ ⋆

Contact Me🧑‍💻:

LinkedIn Stack Overflow Hugging Face Kaggle

Releases

No releases published

Packages

No packages published