You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Before I ask you a question, I would like to say thank you for sharing the good information.
I'm going to ask you 3 questions
I tried to use the Multiligual-CLIP you mentioned in issue1, but the only models using the ViT-B in that model are ViT-B/16+ and ViT-B/32. Is it correct that only ViT-B/16 is available for the base model of the pretrained CLIP of CLIP4STR?
Is there a way to produce inference results in different languages without additional Finetuning?
Do you have any plans to write a guide for Finetuning CLIP4STR in another language?
The text was updated successfully, but these errors were encountered:
I do believe ViT-B/16 model is a decent choice for STR fine-tuning. Training data has a bigger influence. You can collect more high-quality data to improve your model performance. ViT-B/16 is also more effecient, maybe a better choice in real application.
Nope. I do not know a practical way to do this. Some fancy ways like using diffusion models or some other methods to directly transfer the image with Korean in it to an image with English in it, may not have decent performance.
Uh oh!
There was an error while loading. Please reload this page.
Before I ask you a question, I would like to say thank you for sharing the good information.
I'm going to ask you 3 questions
I tried to use the Multiligual-CLIP you mentioned in issue1, but the only models using the ViT-B in that model are ViT-B/16+ and ViT-B/32. Is it correct that only ViT-B/16 is available for the base model of the pretrained CLIP of CLIP4STR?
Is there a way to produce inference results in different languages without additional Finetuning?
Do you have any plans to write a guide for Finetuning CLIP4STR in another language?
The text was updated successfully, but these errors were encountered: