Detecting Hallucination in ALMs (Name is temporary)
Group #34 Project for 11-785 : Intro to Deep Learning at CMU
This downloads annotations of AudioCaps and corresponding audio files fron Youtube, trims and saves in 44.1khz wav file to ./data/AudioCaps
./data/prepare/AudioCaps.sh
This downloads and unzips audio files, caption files, metadata of Clotho to ./data/Clotho. Audio files of Clotho are already equally sampled at 44.1khz.
./data/prepare/Clotho.sh
download and move gpt4 generated entailment .csv under data/AudioCaps/entailment and data/Clotho/entailment.
python ./data/prepare/process.py --data clotho --data_dir ./data
- Clone pengi from microsoft/Pengi
cd ./models
git clone https://github.com/microsoft/Pengi.git
follow Pengi's readme to obtain checkpoint of model
- Train with main.py
# set your own model_cfg in ./configs for trainings
python main.py --model pengi --model_cfg ./configs/pengi_linear_classifier.yaml --classifier linear --data_dir ./data
- Data Download & Preparing
- AudioCaps, Jinju Kim
- Clotho, Jinju Kim
- Baseline Model Reconstruction
- MS CLAP '23
- Pengi-Enc, Jinju Kim
- LAION CLAP
- MS CLAP '22