Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update TrainConfig optimizer hyperparameters #82

Merged
merged 2 commits into from
Nov 7, 2022

Conversation

jon-tow
Copy link
Collaborator

@jon-tow jon-tow commented Nov 6, 2022

This PR provides the following updates based on observations from issue #53:

  • Adds the previously unused weight_decay TrainConfig parameter to AccelerateRLModel optimizer initialization.
  • Adds an opt_eps TrainConfig parameter to the AccelerateRLModel to allow users to specify an optimizer's epsilon value.
  • Renames learning_rate_init and learning_rate_target to lr_init and lr_target TrainConfigs, respectively, for consistency with learning rate naming conventions in torch.optim algorithms.
  • Removes unused lr_ramp_steps and lr_decay_steps TrainConfig parameters which look to be leftovers from the magicCARP lr scheduler.

@jon-tow jon-tow marked this pull request as draft November 7, 2022 04:19
@jon-tow jon-tow marked this pull request as ready for review November 7, 2022 05:50
@jon-tow jon-tow marked this pull request as draft November 7, 2022 05:52
@jon-tow jon-tow marked this pull request as ready for review November 7, 2022 17:05
@LouisCastricato
Copy link
Contributor

@herbiebradley since this is for your issue.

Copy link

@herbiebradley herbiebradley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Can we change the default epsilon to 1.0e-7 (default in Tensorflow) since the ablation paper found it was significantly better?

Also, does anyone know where the weight decay of 1.0e-6 comes from, since it is significantly lower than the default?

@maxreciprocate
Copy link
Collaborator

Also, does anyone know where the weight decay of 1.0e-6 comes from, since it is significantly lower than the default?

From here https://github.com/EleutherAI/magiCARP/blob/3537b4d7c98a4a384964856740a65731746e04fe/configs/base_config.yml#L18

Copy link
Collaborator

@maxreciprocate maxreciprocate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@LouisCastricato LouisCastricato merged commit 0270960 into CarperAI:master Nov 7, 2022
@jon-tow jon-tow deleted the update-optim-hparams branch November 7, 2022 22:55
@jon-tow
Copy link
Collaborator Author

jon-tow commented Nov 7, 2022

@herbiebradley I intentionally left the default $\epsilon$ to 1.0e-8 as I wasn't 100% sure that the improvement would directly carry over to AdamW (we do not use vanilla Adam). I'd like to try some experiments before updating in another PR if that's okay.

@jon-tow
Copy link
Collaborator Author

jon-tow commented Nov 7, 2022

cc #76 for awareness

@ayulockin
Copy link
Contributor

Thanks for tagging the issue @jon-tow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants