You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thanks for sharing the datasets! I'm trying to train a flan model using t5 and other backbone models. However i'm not confident enough on how well I reproduced your results. Specifically I got much lower MMLU scores. Could you please share the training loss curve (or simply the loss at convergence?) Below is mine:
I was using similar settings (batch size = 80, max_seq_len = 2300)
The final loss is around 0.6 after smoothing. What about the official values?
The text was updated successfully, but these errors were encountered:
Hi, thanks for sharing the datasets! I'm trying to train a flan model using t5 and other backbone models. However i'm not confident enough on how well I reproduced your results. Specifically I got much lower MMLU scores. Could you please share the training loss curve (or simply the loss at convergence?) Below is mine:

I was using similar settings (batch size = 80, max_seq_len = 2300)
The final loss is around 0.6 after smoothing. What about the official values?
The text was updated successfully, but these errors were encountered: