Could you share the training loss to improve reproducibility? #83

xuanqing94 · 2023-06-19T02:44:33Z

Hi, thanks for sharing the datasets! I'm trying to train a flan model using t5 and other backbone models. However i'm not confident enough on how well I reproduced your results. Specifically I got much lower MMLU scores. Could you please share the training loss curve (or simply the loss at convergence?) Below is mine:

I was using similar settings (batch size = 80, max_seq_len = 2300)
The final loss is around 0.6 after smoothing. What about the official values?

StephennFernandes · 2023-06-21T14:04:25Z

hey could you please let me know where can I find the scripts/.gin files to train FLAN on t5x based models ?

xuanqing94 · 2023-06-21T15:40:51Z

@StephennFernandes I can't help you on that because I am using PyTorch based training framework.

StephennFernandes · 2023-06-21T15:44:16Z

you mean you used the huggingface model and fine tuned it on FLAN datasets ?

that works fine for me as well.

btw did you get relatively similar results to what the official FLAN-T5 has ?

xuanqing94 · 2023-06-21T15:49:22Z

I use checkpoints downloaded from huggingface, but I ran with my in-house distributed training code.

I only tested and compared with FLAN-T5 it on MMLU dataset. It turns out that my results are way lower than the official FLAN-T5 checkpoints.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could you share the training loss to improve reproducibility? #83

Could you share the training loss to improve reproducibility? #83

xuanqing94 commented Jun 19, 2023 •

edited

Loading

StephennFernandes commented Jun 21, 2023

xuanqing94 commented Jun 21, 2023

StephennFernandes commented Jun 21, 2023

xuanqing94 commented Jun 21, 2023

Could you share the training loss to improve reproducibility? #83

Could you share the training loss to improve reproducibility? #83

Comments

xuanqing94 commented Jun 19, 2023 • edited Loading

StephennFernandes commented Jun 21, 2023

xuanqing94 commented Jun 21, 2023

StephennFernandes commented Jun 21, 2023

xuanqing94 commented Jun 21, 2023

xuanqing94 commented Jun 19, 2023 •

edited

Loading