Skip to content

ICML 2025 Spotlight, PCEvolve: Private Contrastive Evolution for Synthetic Dataset Generation via Few-Shot Private Data and Generative APIs

License

Notifications You must be signed in to change notification settings

TsingZ0/PCEvolve

Repository files navigation

Introduction

This repository provides the official implementation of our paper, PCEvolve: Private Contrastive Evolution for Synthetic Dataset Generation via Few-Shot Private Data and Generative APIs—accepted as a spotlight paper at ICML 2025. The project builds upon our open-sourced framework, CoAutoGen, which contains core baselines and essential utilities. For ongoing updates and maintenance, please refer to CoAutoGen.

Preparation

  1. Prepare the Required Large Model APIs (Skip If Using Online APIs).

    • Set up large model APIs either through local deployment (downloading model weights for captioner, generator, LLM, etc.) or via online-accessible APIs.
  2. Set Up the Environment

    • Install CUDA.
    • Install the latest Conda and activate it.
    • Create the Conda environment:
      conda env create -f env_cuda_latest.yaml  
      # You may need to downgrade PyTorch using pip to match the CUDA version  

Run the Code

For a COVID-19 pneumonia detection task, generate 100 synthetic images per class based on 10 real and private chest radiography (X-ray) images on the edge using the Stable Diffusion API. The edge device utilizes a ResNet-18, with Private Contrastive Evolution (PCEvolve) for selection and feedback provided with privacy protection:

python -u main.py \
  -tt syn \        # Task Type: Only using the synthetic dataset for downstream task
  -tm I2I \        # Task Mode: Image to Image
  -f Feedback \    # Framework: Feedback mechanism
  -did 1 \         # GPU device ID
  -eps 0.2 \       # Privacy budget epsilon per iteration
  -rvpl 1 \        # Real and private volume per label
  -vpl 2 \         # Generated volume per label
  -sgen StableDiffusion \  # Select StableDiffusion as the generative model
  -cret 1 \        # Other hyperparameter
  -cue ResNet18 \  # Edge client embedding model
  -cmodel ResNet18 \  # Edge client model
  -cmp 1 \         # Other hyperparameter
  -cef 1 \         # Other hyperparameter
  -cdata COVIDx \  # Private dataset
  -s PCEvolve \    # Synthetic data selector: Private Contrastive Evolution
  -tau 10          # Similarity calibrating factor

About

ICML 2025 Spotlight, PCEvolve: Private Contrastive Evolution for Synthetic Dataset Generation via Few-Shot Private Data and Generative APIs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages