Skip to content

Implementation of FoldMark: Safeguarding Protein Structure Generative Models with Distributional and Evolutionary Watermarking

License

Notifications You must be signed in to change notification settings

zaixizhang/FoldMark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FoldMark: Safeguarding Protein Structure Generative Models with Distributional and Evolutionary Watermarking

Hugging Face Demo Paper Twitter

🌟 Try Our Demo!

We've created an interactive demo on Hugging Face Spaces where you can:

  • Input protein sequences and get watermarked structure predictions
  • Compare watermarked vs. non-watermarked structures
  • Visualize the differences in 3D
  • Pretrained Checkpoints and Inference code

Try the Demo →

🚀 Overview

FoldMark is a novel watermarking framework for protein generative models that embeds user-specific data across protein structures. It:

  • Leverages evolutionary principles to adaptively embed watermarks (higher capacity in flexible regions, minimal disruption in conserved areas)
  • Maintains structural quality (>0.9 scTM scores) while achieving >95% watermark bit accuracy at 32 bits
  • Enables tracking of up to 1 million users and detection of unauthorized model training (even with only 30% watermarked data)
  • Works with leading models like AlphaFold3, ESMFold, RFDiffusion, and RFDiffusionAA
  • Withstands post-processing and adaptive attacks, offering a generalized solution for ethical protein AI deployment

📊 Results

Structure Prediction with Watermarking

De Novo Protein Structure Design with Watermarking

🛠️ Installation

# Create and activate conda environment
conda env create -f foldmark.yml
conda activate fm

# Install torch-scatter
pip install torch-scatter -f https://data.pyg.org/whl/torch-2.0.0+cu117.html

# Install local package
pip install -e .

📊 Training Pipeline

Data Setup

  1. Download preprocessed SCOPe dataset (~280MB): Download Link
  2. Extract the data:
    tar -xvzf preprocessed_scope.tar.gz
    rm preprocessed_scope.tar.gz

Training Steps

  1. Pretrain the model:
    python -W ignore experiments/pretrain.py
  2. Finetune with watermarking:
    python -W ignore experiments/finetune.py

📝 Citation

If you find this work helpful, please cite our paper:

@article{zhang2024foldmark,
  title={FoldMark: Protecting Protein Generative Models with Watermarking},
  author={Zhang, Zaixi and Jin, Ruofan and Fu, Kaidi and Cong, Le and Zitnik, Marinka and Wang, Mengdi},
  journal={bioRxiv},
  pages={2024--10},
  year={2024},
  publisher={Cold Spring Harbor Laboratory}
}

🙏 Acknowledgments

We thank the following open-source projects for their valuable contributions:

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

About

Implementation of FoldMark: Safeguarding Protein Structure Generative Models with Distributional and Evolutionary Watermarking

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages