how to Finetuning in korean (or other language) #20

simsimee · 2024-07-17T08:27:19Z

Before I ask you a question, I would like to say thank you for sharing the good information.

I'm going to ask you 3 questions

I tried to use the Multiligual-CLIP you mentioned in issue1, but the only models using the ViT-B in that model are ViT-B/16+ and ViT-B/32. Is it correct that only ViT-B/16 is available for the base model of the pretrained CLIP of CLIP4STR?
Is there a way to produce inference results in different languages without additional Finetuning?
Do you have any plans to write a guide for Finetuning CLIP4STR in another language?

mzhaoshuai · 2024-07-17T11:34:41Z

Hi, thx for reaching out.

I do believe ViT-B/16 model is a decent choice for STR fine-tuning. Training data has a bigger influence. You can collect more high-quality data to improve your model performance. ViT-B/16 is also more effecient, maybe a better choice in real application.
Nope. I do not know a practical way to do this. Some fancy ways like using diffusion models or some other methods to directly transfer the image with Korean in it to an image with English in it, may not have decent performance.
Sorry. I currently work on other topics.

simsimee · 2024-07-22T10:25:44Z

Thank you for your response, and thank you for sharing your wonderful project.

simsimee closed this as completed Jul 22, 2024

Provide feedback