-
Notifications
You must be signed in to change notification settings - Fork 476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve PPO readability #210
Conversation
@reciprocated to help with documenting this code, are you able to confirm what trlx/trlx/trainer/accelerate_base_trainer.py Lines 445 to 449 in 84a0711
Git blame suggests this is your code. My understanding is it runs the same batch multiple times - is that as expected? That is to say it lets the user repeat a batch, rather than e.g. having multiple steps e.g. turns in a text-based game (which I don't think the library supports). |
Hey there!
|
Awesome thanks for the quick reply! I'll add this in to the comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Splendid, what's left is to merge changes from the main and do a pre-commit run and I've also left a small suggestion to remain consistent in naming throughout. Thanks for the effort @alan-cooney @jezgillen
Let's get this merged |
5f67d72
to
5866b97
Compare
03a7f96
to
96b3d79
Compare
Thanks for the review @reciprocated Note I've had to resolve a tonne of merge conflicts, so it's worth checking those are in line with expectations (they're primarily with your commits). Given the size of this PR please can you take a look as soon as possible & merge, otherwise we'll likely get other merge conflicts as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Marvellous, merging!
https://wandb.ai/sorry/trlx/reports/Improve-PPO-readability-210---VmlldzozNDg3ODEx
Thanks for the quick review! |
Adds mostly comments and renames a few variables to be more descriptive. Also deletes two unused methods.
I've run an example with wandb to check this doesn't change anything unexpectedly - https://wandb.ai/alancooney/trlx/runs/aew4nu69 (same as before the changes)
Co-authored-by: @jezgillen