-
Notifications
You must be signed in to change notification settings - Fork 476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support ILQL for T5 model, Fix PPO T5 for refactored code #290
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad that it works overall, but we still have to find a way to refactor many repetitions here, at least up to the same level as it's done in PPO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @PhungVanDuy, great work! I've left some feedback to look at when you get a chance 👍 👍
|
||
input_ids: TensorType["query_size"] | ||
attention_mask: TensorType["query_size"] | ||
decoder_input_ids: TensorType["reward_size"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can decoder_input_ids
be moved into ILQLElement
, possibly with a None
default? Seems like a large amount of duplicate code for 1 extra field (this propagates down elsewhere in the package like to ilql_seq2seq_collate_fn
). Same question for ILQLSeq2SeqBatch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to separate this one for easier maintenance. We can re-design this one along when we find a way to seperate "seq2seq" like you said here: #290 (comment)
Tokenizes samples and shapes rewards into proper tensors and then inserts the resulting dataset into the trainer | ||
""" | ||
|
||
if self.config.model.model_arch_type == "seq2seq": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside: At some point, we need to design a better way to handle these arch-type conditionals 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good on my end! Great work :)
Thank you so much for your comment! |
Adapted ILQL implementation for Causal models, this PR support ILQL for the T5 model.
Related issue: Support for T5 for ILQL #204
Add ILQL for the T5 model
Run Summarization example by OpenAI dataset
Add sentiment examples for PPO T5
Add tests
ILQL T5 on OpenAI Dataset: https://wandb.ai/pvduy/trlx/runs/sm5gug89
Sentiment T5: https://wandb.ai/pvduy/trlx/runs/vgvph8cc