Baby Cry Analysis for Research Paper

Overview

Welcome to the GitHub repository dedicated to the research paper titled 'Evaluating Convolutional Neural Networks and Vision Transformers for Baby Cry Sound Analysis' authored by Samir Ashraf, Noha S. Tawfik and Dalia Sobhy. This repository contains the code and resources used in the research, aiming to explore a more personalized approach to understanding the reasons behind a baby's cry through machine learning techniques.

Project Structure

CNNmc: Implementation of a Multi-Input Convolutional Neural Network (CNN) using Mel spectrogram and CQT spectrogram features.
CNNfm: Implementation of a Multi-Input CNN using Mel spectrogram and MFCC features.
SVM: Support Vector Machine (SVM) with VGG16 feature extraction for classification.
spectoT: Code to preprocess audio files and extract CQT and Mel spectrograms.
augmentation: Audio augmentation techniques to enhance dataset diversity.
ViT: Training a Vision Transformer (ViT) from scratch on smaller datasets with specialized attention mechanisms.

How to Use

Data Preparation: Prepare your baby cry audio dataset. The spectoT folder assists in extracting necessary spectrogram features.
Model Selection: Choose the appropriate model from CNNmc, CNNfm, SVM, or ViT directories for training and evaluating models.
Augmentation: Utilize the augmentation section to apply augmentation methods to audio files.
Results and Analysis: Analyze model performance and insights obtained for personalized infant care.

Inside the ViT directory, you'll find two options for ViT models:

Vanilla ViT: The Vanilla Vision Transformer model can be executed by setting run_vanilla_vit = True in the code.
Modified ViT: The Modified Vision Transformer model, which incorporates shifted patch tokenization and locality self-attention, can be executed by setting run_vanilla_vit = False.

Please note that the Modified ViT has been our best-performing model in our experiments, delivering superior results.

Research Paper

This repository aligns with the research paper 'Evaluating Convolutional Neural Networks and Vision Transformers for Baby Cry Sound Analysis' authored by Samir Ashraf, Noha S. Tawfik and Dalia Sobhy. Please cite this repository and the paper if you find the resources helpful for your work.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
CNNmc.ipynb		CNNmc.ipynb
CNNmf.ipynb		CNNmf.ipynb
README.md		README.md
VIT.ipynb		VIT.ipynb
augmentation.ipynb		augmentation.ipynb
spectoT.ipynb		spectoT.ipynb
svm.ipynb		svm.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Baby Cry Analysis for Research Paper

Overview

Project Structure

How to Use

Research Paper

About

Releases

Packages

Languages

SamirElgehiny/baby-cry-analysis

Folders and files

Latest commit

History

Repository files navigation

Baby Cry Analysis for Research Paper

Overview

Project Structure

How to Use

Research Paper

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages