Predicting the Original Appearance of Damaged Historical Documents

🖼️ Gallery • 📊 HDR28K • 🔥 Model Zoo • 🔥 Dataset Zoo • 🚧 Installation • 📺 Inference • 📏 Evaluation

🌟 Highlight

We introduce a Historical Document Repair (HDR) task, which endeavors to predict the original appearance of damaged historical document images.
We build a large-scale historical document repair dataset, termed HDR28K, which includes 28,552 damaged-repaired image pairs with character-level annotations and multi-style degradation.
🔥🔥🔥 We propose a Diffusion-based Historical Document Repair method (DiffHDR), which augments the DDPM framework with semantic and spatial information

📰 News

2025.03.20: 🎉🎉 The Historical Document Repair dataset HDR28K is released!
2024.12.17: Release inference code.
2024.12.10: 🎉🎉 Our paper is accepted by AAAI2025.

🔥 Model Zoo

Model	chekcpoint	status
DiffHDR	GoogleDrive / BaiduYun:x62f	Released

🔥 Dataset Zoo

Model	chekcpoint	status
HDR28K	BaiduYun:upm9	Released

The dataset file structure is as followed:

- character_missing
  - test
    - char_mask_images
    - content_images
    - degraded_images
    - original_images
  - train
    - char_mask_images
    - content_images
    - degraded_images
    - original_images
- ink erosion
  - similar to 'character_missing'
- paper damage
  - similar to 'character_missing'
- test_image_only_damage
  - hole_M5_image_2000_32_467_544_979_degrade0.png
  - ......

NOTE: The test_image_only_damage contains the gt image after replacing the non-damaged region of $x_r$ by the target $x_{target}$.

🚧 Installation

Prerequisites (Recommended)

Linux
Python 3.9
Pytorch 1.13.1
CUDA 11.7

Environment Setup

Clone this repo:

git clone https://github.com/yeungchenwa/HDR.git

Step 0: Download and install Miniconda from the official website.

Step 1: Create a conda environment and activate it.

conda create -n diffhdr python=3.9 -y
conda activate diffhdr

Step 2: Install related version Pytorch following here.

# Suggested
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117

Step 3: Install the required packages.

pip install -r requirements.txt

📺 Inference

Using DiffHDR for damaged historical documents repair (Some examples including damaged images, mask images, and content images are provided in /examples):

sh scripts/inference.sh

device: CUDA or CPU used for inference,
image_path: The damaged image path.
mask_image_path: The masked image path.
content_image_path: The content image path.
save_dir: The directory for saving repaired image.
content_mask_guidance_scale: The guidance scale of content image and masked image.
degraded_guidance_scale: The guidance scale of damaged image.
ckpt_path: The unet checkpoint path.
num_inference_steps: The number of inference steps.

📊 HDR28K

📏 Evaluation

Coming soon ...

💙 Acknowledgement

diffusers

⛔️ Copyright

This repository can only be used for non-commercial research purposes.
For commercial use, please contact Prof. Lianwen Jin (eelwjin@scut.edu.cn).
Copyright 2024, Deep Learning and Vision Computing Lab (DLVC-Lab), South China University of Technology.

📇 Citation

@inproceedings{yang2024fontdiffuser,
  title={Predicting the Original Appearance of Damaged Historical Documents},
  author={Yang, Zhenhua and Peng, Dezhi and Shi, Yongxin and Zhang, Yuyi and Liu, Chongyu and Jin, Lianwen},
  booktitle={Proceedings of the AAAI conference on artificial intelligence},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
examples		examples
figures		figures
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
inference.py		inference.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting the Original Appearance of Damaged Historical Documents

🌟 Highlight

📰 News

🔥 Model Zoo

🔥 Dataset Zoo

🚧 Installation

Prerequisites (Recommended)

Environment Setup

📺 Inference

📊 HDR28K

📏 Evaluation

💙 Acknowledgement

⛔️ Copyright

📇 Citation

🌟 Star Rising

About

Releases

Packages

Languages

yeungchenwa/HDR

Folders and files

Latest commit

History

Repository files navigation

Predicting the Original Appearance of Damaged Historical Documents

🌟 Highlight

📰 News

🔥 Model Zoo

🔥 Dataset Zoo

🚧 Installation

Prerequisites (Recommended)

Environment Setup

📺 Inference

📊 HDR28K

📏 Evaluation

💙 Acknowledgement

⛔️ Copyright

📇 Citation

🌟 Star Rising

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages