imagenet1k

simple pytorch pipeline for pretraining/finetuning vision models on imagenet-1k

when you want to fine-tune vision models with a different image size (e.g., optimizing performance for a kaggle competition dataset with an image size of 64)
- discussion on this use case
when you want supervied pretraining on imagenet-1k
when you want self-supervised pretraining on imagenet-1k

data

download the imagenet-1k dataset from huggingface dataset and arrange the data as follows
for more details, check data/

data/
├── train_images_0.tar.gz
├── train_images_1.tar.gz
├── train_images_2.tar.gz
├── train_images_3.tar.gz
├── train_images_4.tar.gz
└── val_images.tar.gz

setup

pip install -r requirements.txt

preprocess

unzip the imagenet-1k located in the data/

python -m data.preprocess

supervised learning

classic supervised learning on imagenet-1k dataset

python -m train --save_dir weights \
                --model_name convnext_base \
                --n_epoch 200 \
                --batch_size 128 \
                --n_worker 8 \
                --n_device 8 \
                --precision 16-mixed \
                --strategy ddp \
                --save_frequency 5 \
                --drop_path_rate 0.5 \
                --label_smoothing 0.1 \
                --input_size 224

self-supervised learning

currently, only facebook research's mae is supported for self-supervised learning

pretraining

python -m pretraining --save_dir pretrained_weights \
                      --model_name facebook/vit-mae-base \
                      --n_epoch 400 \
                      --batch_size 256 \
                      --n_worker 8 \
                      --n_device 8 \
                      --precision 16-mixed \
                      --strategy ddp \
                      --save_frequency 20 \
                      --input_size 224 \
                      --wd 0.05 \
                      --norm_pix_loss

finetuning

python -m finetuning --save_dir weights \
                     --model_name facebook/vit-mae-base \
                     --pretrained_dir pretrained_weights \
                     --n_epoch 200 \
                     --batch_size 128 \
                     --n_worker 8 \
                     --n_device 8 \
                     --precision 16-mixed \
                     --strategy ddp \
                     --save_frequency 5 \
                     --input_size 224 \
                     --drop_path_rate 0.1 \
                     --label_smoothing 0.1 \
                     --wd 0.05

results

check results/
mae_vit_base
- it takes about 48 hours for pretraining using 8 x RTX 3090
- it takes about 36 hours for finetuning using 8 x RTX 3090

metric	mae_vit_base	vit_base
top1_acc	81.24	78.47

acknowledgement

this project makes use of the following libraries and models

Name	Name	Last commit message	Last commit date
Latest commit siwooyong Add mae Nov 17, 2024 86e8fa1 · Nov 17, 2024 History 3 Commits
data	data	Add pipeline	Nov 7, 2024
pretrained_weights	pretrained_weights	Add pipeline	Nov 7, 2024
results	results	Add mae	Nov 17, 2024
src	src	Add pipeline	Nov 7, 2024
.gitignore	.gitignore	Add mae	Nov 17, 2024
LICENSE	LICENSE	Add pipeline	Nov 7, 2024
README.md	README.md	Add mae	Nov 17, 2024
finetuning.py	finetuning.py	Add pipeline	Nov 7, 2024
pretraining.py	pretraining.py	Add pipeline	Nov 7, 2024
requirements.txt	requirements.txt	Add pipeline	Nov 7, 2024
train.py	train.py	Add mae	Nov 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

imagenet1k

simple pytorch pipeline for pretraining/finetuning vision models on imagenet-1k

data

setup

preprocess

supervised learning

self-supervised learning

pretraining

finetuning

results

acknowledgement

About

Releases

Packages

Languages

License

siwooyong/imagenet1k

Folders and files

Latest commit

History

Repository files navigation

imagenet1k

simple pytorch pipeline for pretraining/finetuning vision models on imagenet-1k

data

setup

preprocess

supervised learning

self-supervised learning

pretraining

finetuning

results

acknowledgement

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages