- Contains code for inference using
meta-llama/Llama-3.2-11B-Vision
. - Works on the given dataset.
- Details for fine-tuning
meta-llama/Llama-3.2-11B-Vision
.
- Code for generating the ground truth CSV file.
- Includes code for ground truth generation.
Processed dataset pushed to Hugging Face
- Includes code to structure text extracted from images using Mistral API
Baseline Inference Metric
- Average Sequence Accuracy: 0.330054
- WER (Average): 2.532779
- CER (Average): 1.510348
FineTuned Inference Metric
- Average Sequence Accuracy: 0.448970
- WER (Average): 1.685408
- CER (Average): 1.428369