Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could you share the training loss to improve reproducibility? #83

Open
xuanqing94 opened this issue Jun 19, 2023 · 4 comments
Open

Could you share the training loss to improve reproducibility? #83

xuanqing94 opened this issue Jun 19, 2023 · 4 comments

Comments

@xuanqing94
Copy link

xuanqing94 commented Jun 19, 2023

Hi, thanks for sharing the datasets! I'm trying to train a flan model using t5 and other backbone models. However i'm not confident enough on how well I reproduced your results. Specifically I got much lower MMLU scores. Could you please share the training loss curve (or simply the loss at convergence?) Below is mine:
image

I was using similar settings (batch size = 80, max_seq_len = 2300)
The final loss is around 0.6 after smoothing. What about the official values?

@StephennFernandes
Copy link

hey could you please let me know where can I find the scripts/.gin files to train FLAN on t5x based models ?

@xuanqing94
Copy link
Author

@StephennFernandes I can't help you on that because I am using PyTorch based training framework.

@StephennFernandes
Copy link

you mean you used the huggingface model and fine tuned it on FLAN datasets ?

that works fine for me as well.

btw did you get relatively similar results to what the official FLAN-T5 has ?

@xuanqing94
Copy link
Author

I use checkpoints downloaded from huggingface, but I ran with my in-house distributed training code.

I only tested and compared with FLAN-T5 it on MMLU dataset. It turns out that my results are way lower than the official FLAN-T5 checkpoints.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants