This repository provides the official implementation of our paper, PCEvolve: Private Contrastive Evolution for Synthetic Dataset Generation via Few-Shot Private Data and Generative APIs—accepted as a spotlight paper at ICML 2025. The project builds upon our open-sourced framework, CoAutoGen, which contains core baselines and essential utilities. For ongoing updates and maintenance, please refer to CoAutoGen.
-
Prepare the Required Large Model APIs (Skip If Using Online APIs).
- Set up large model APIs either through local deployment (downloading model weights for captioner, generator, LLM, etc.) or via online-accessible APIs.
-
Set Up the Environment
- Install CUDA.
- Install the latest Conda and activate it.
- Create the Conda environment:
conda env create -f env_cuda_latest.yaml # You may need to downgrade PyTorch using pip to match the CUDA version
For a COVID-19 pneumonia detection task, generate 100 synthetic images per class based on 10 real and private chest radiography (X-ray) images on the edge using the Stable Diffusion API. The edge device utilizes a ResNet-18, with Private Contrastive Evolution (PCEvolve) for selection and feedback provided with privacy protection:
python -u main.py \
-tt syn \ # Task Type: Only using the synthetic dataset for downstream task
-tm I2I \ # Task Mode: Image to Image
-f Feedback \ # Framework: Feedback mechanism
-did 1 \ # GPU device ID
-eps 0.2 \ # Privacy budget epsilon per iteration
-rvpl 1 \ # Real and private volume per label
-vpl 2 \ # Generated volume per label
-sgen StableDiffusion \ # Select StableDiffusion as the generative model
-cret 1 \ # Other hyperparameter
-cue ResNet18 \ # Edge client embedding model
-cmodel ResNet18 \ # Edge client model
-cmp 1 \ # Other hyperparameter
-cef 1 \ # Other hyperparameter
-cdata COVIDx \ # Private dataset
-s PCEvolve \ # Synthetic data selector: Private Contrastive Evolution
-tau 10 # Similarity calibrating factor