Skip to content

[CVPR 2025] Official implementation of paper "Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie Dubbing"

License

Notifications You must be signed in to change notification settings

ZZDoog/ProDubber

Repository files navigation

Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie Dubbing

[CVPR2025] Official implementation of paper "Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie Dubbing"

alt text

🌼 Environment

Our python version is 3.8.18 and cuda version 11.8. It's possible to have another compatible version. Both training and inference are implemented with PyTorch on a GeForce RTX 4090 GPU.

conda create -n dubbing python=3.8.18
conda activate dubbing
pip install -r requirements.txt

🔧 Training

For First Stage (Acoustic Pre-training)

python train_first.py -p Configs/config_v2c_stage1.yml  # V2C-Animation benchmark
python train_first.py -p Configs/config_grid_stage1.yml  # GRID benchmark

For Second Stage (Prosody Adapting)

python train_second.py -p Configs/config_v2c.yml  # V2C-Animation benchmark
python train_second_grid.py -p Configs/config_grid.yml  # GRID benchmark

💡 Checkpoints

We provide the first stage and second stage pre-trained checkpoints on V2C-Animation and GRID benchmarks as follows, respectively:

First stage (For secnod stage only, can not directly generate wavform)

Second stage (Can used to directly generate waveform)

✍ Inference

For V2C-Animation Benchmark

There is three generation settings in V2C-Animation benchmark:

python inference_v2c.py -n 'YOUR_EXP_NAME' --epoch 'YOUR_EPOCH' --setting 1
python inference_v2c.py -n 'YOUR_EXP_NAME' --epoch 'YOUR_EPOCH' --setting 2
python inference_v2c.py -n 'YOUR_EXP_NAME' --epoch 'YOUR_EPOCH' --setting 3

For GRID benchmark

There is two generation settings in GRID benchmark:

python inference_grid.py -n 'YOUR_EXP_NAME' --epoch 'YOUR_EPOCH' --setting 1
python inference_grid.py -n 'YOUR_EXP_NAME' --epoch 'YOUR_EPOCH' --setting 2

📊 Dataset

🙏 Acknowledgments

We would like to thank the authors of previous related projects for generously sharing their code and insights: StyleTTS, StyleTTS2, StyleDubber, PL-BERT, and HiFi-GAN.

🤝 Ciation

If you find our work useful, please consider citing:

@misc{zhang2025produbber,
      title={Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie Dubbing}, 
      author={Zhedong Zhang and Liang Li and Chenggang Yan and Chunshan Liu and Anton van den Hengel and Yuankai Qi},
      year={2025},
      eprint={2503.12042},
      archivePrefix={arXiv},
      url={https://arxiv.org/abs/2503.12042}, 
}

About

[CVPR 2025] Official implementation of paper "Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie Dubbing"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages