FoldMark: Safeguarding Protein Structure Generative Models with Distributional and Evolutionary Watermarking
We've created an interactive demo on Hugging Face Spaces where you can:
- Input protein sequences and get watermarked structure predictions
- Compare watermarked vs. non-watermarked structures
- Visualize the differences in 3D
- Pretrained Checkpoints and Inference code
FoldMark is a novel watermarking framework for protein generative models that embeds user-specific data across protein structures. It:
- Leverages evolutionary principles to adaptively embed watermarks (higher capacity in flexible regions, minimal disruption in conserved areas)
- Maintains structural quality (>0.9 scTM scores) while achieving >95% watermark bit accuracy at 32 bits
- Enables tracking of up to 1 million users and detection of unauthorized model training (even with only 30% watermarked data)
- Works with leading models like AlphaFold3, ESMFold, RFDiffusion, and RFDiffusionAA
- Withstands post-processing and adaptive attacks, offering a generalized solution for ethical protein AI deployment
# Create and activate conda environment
conda env create -f foldmark.yml
conda activate fm
# Install torch-scatter
pip install torch-scatter -f https://data.pyg.org/whl/torch-2.0.0+cu117.html
# Install local package
pip install -e .
- Download preprocessed SCOPe dataset (~280MB): Download Link
- Extract the data:
tar -xvzf preprocessed_scope.tar.gz rm preprocessed_scope.tar.gz
- Pretrain the model:
python -W ignore experiments/pretrain.py
- Finetune with watermarking:
python -W ignore experiments/finetune.py
If you find this work helpful, please cite our paper:
@article{zhang2024foldmark,
title={FoldMark: Protecting Protein Generative Models with Watermarking},
author={Zhang, Zaixi and Jin, Ruofan and Fu, Kaidi and Cong, Le and Zitnik, Marinka and Wang, Mengdi},
journal={bioRxiv},
pages={2024--10},
year={2024},
publisher={Cold Spring Harbor Laboratory}
}
We thank the following open-source projects for their valuable contributions:
This project is licensed under the MIT License - see the LICENSE file for details.