Support ILQL for T5 model, Fix PPO T5 for refactored code #290

PhungVanDuy · 2023-02-08T06:49:16Z

Adapted ILQL implementation for Causal models, this PR support ILQL for the T5 model.

Related issue: Support for T5 for ILQL #204
Add ILQL for the T5 model
Run Summarization example by OpenAI dataset
Add sentiment examples for PPO T5
Add tests

ILQL T5 on OpenAI Dataset: https://wandb.ai/pvduy/trlx/runs/sm5gug89
Sentiment T5: https://wandb.ai/pvduy/trlx/runs/vgvph8cc

maxreciprocate

Glad that it works overall, but we still have to find a way to refactor many repetitions here, at least up to the same level as it's done in PPO

configs/ilql_summarize_t5.yml

examples/summarize_rlhf/ilql_summarize_t5.py

trlx/orchestrator/offline_orchestrator.py

trlx/trainer/nn/ilql_models.py

…o ilql_t5

jon-tow

Hi, @PhungVanDuy, great work! I've left some feedback to look at when you get a chance 👍 👍

ds_config.json

jon-tow · 2023-03-02T01:53:16Z

trlx/data/ilql_types.py

+
+    input_ids: TensorType["query_size"]
+    attention_mask: TensorType["query_size"]
+    decoder_input_ids: TensorType["reward_size"]


Can decoder_input_ids be moved into ILQLElement, possibly with a None default? Seems like a large amount of duplicate code for 1 extra field (this propagates down elsewhere in the package like to ilql_seq2seq_collate_fn). Same question for ILQLSeq2SeqBatch

I prefer to separate this one for easier maintenance. We can re-design this one along when we find a way to seperate "seq2seq" like you said here: #290 (comment)

trlx/models/modeling_ppo.py

trlx/models/modeling_ilql.py

trlx/trainer/accelerate_ilql_trainer.py

jon-tow · 2023-03-02T02:03:20Z

trlx/trainer/accelerate_ilql_trainer.py

+        Tokenizes samples and shapes rewards into proper tensors and then inserts the resulting dataset into the trainer
+        """
+
+        if self.config.model.model_arch_type == "seq2seq":


Aside: At some point, we need to design a better way to handle these arch-type conditionals 😅

trlx/models/modeling_ilql.py

trlx/trainer/accelerate_ppo_trainer.py

trlx/models/modeling_ilql.py

trlx/trainer/accelerate_ppo_trainer.py

jon-tow

Looks good on my end! Great work :)

PhungVanDuy · 2023-03-06T07:39:42Z

Looks good on my end! Great work :)

Thank you so much for your comment!

Duy Phung and others added 3 commits February 7, 2023 18:42

initial ilql t5

d2ac201

add output for iqlq element

00b760f

first train t5 with ilql

b3f0108

PhungVanDuy marked this pull request as draft February 8, 2023 06:49

Duy Phung and others added 2 commits February 10, 2023 06:16

change config to sft

c5af055

update config

ac1cb41

PhungVanDuy marked this pull request as ready for review February 14, 2023 02:33

PhungVanDuy marked this pull request as draft February 14, 2023 02:34

Duy Phung and others added 7 commits February 14, 2023 02:44

resolve confict

2f887e2

fix format

3256b57

add seq2seq to new make experience

0dbfe7a

fix make experience with new refactor code

cdb5021

remove confused comment for reward model

86b4cb6

action take from output seq2seq

be85c43

fix generate step with decoder ids

e20d014

maxreciprocate reviewed Feb 15, 2023

View reviewed changes

Duy Phung and others added 3 commits February 15, 2023 17:19

clean code upload sft to hf

8fbd1f1

fix typo

43571a5

add pad as criteria for end sampling

5698c21

PhungVanDuy marked this pull request as ready for review February 20, 2023 09:16

Duy Phung and others added 8 commits February 26, 2023 18:28

add sentiment t5

8a2fa44

add new ilql t5

c7134a2

adapt new huggingface wrapper for t5 ilql

64479bd

Merge branch 'ppo_sent_t5' of https://github.com/PhungVanDuy/trlx int…

a833c43

…o ilql_t5

add ppo sent t5

7a9e06c

add ppo sent t5

b9e1a30

remove bad comment

2b0da93

fix config path sentiment

dc1dd97

jon-tow requested changes Mar 2, 2023

View reviewed changes

fix ppo index for start token and value shift

06d6ed5

clean code

87e2598

PhungVanDuy changed the title ~~Support ILQL for T5 model~~ Support ILQL for T5 model, Fix PPO T5 for refactored code Mar 2, 2023

Duy Phung and others added 9 commits March 3, 2023 07:03

add config ilql sentiment

bb63c14

fix end token do not added

501055f

remove ilql sentiment

9bd7c8e

format code

314a4b9

Merge branch 'master' into ilql_t5

dcc11a6

Merge branch 'master' into ilql_t5

593084e

move config to main ppo t5 sent

a50512b

add tests ilql_t5

30f79e5

fix tests

a02599c

PhungVanDuy requested review from jon-tow and maxreciprocate March 4, 2023 05:38

Duy Phung added 2 commits March 4, 2023 06:17

remove t5 configs from configs folder

0f3ed6f

remove t5 configs from configs folder

3b2777d

jon-tow reviewed Mar 6, 2023

View reviewed changes

trlx/models/modeling_ilql.py Outdated Show resolved Hide resolved

trlx/trainer/accelerate_ppo_trainer.py Outdated Show resolved Hide resolved

Duy Phung added 2 commits March 6, 2023 07:26

remove old comment on shift token

35808fc

remove notnecessary ffunctions

f4e310e

jon-tow approved these changes Mar 6, 2023

View reviewed changes

PhungVanDuy merged commit 7a947dd into CarperAI:main Mar 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support ILQL for T5 model, Fix PPO T5 for refactored code #290

Support ILQL for T5 model, Fix PPO T5 for refactored code #290

PhungVanDuy commented Feb 8, 2023 •

edited

Loading

maxreciprocate left a comment

jon-tow left a comment

jon-tow Mar 2, 2023

PhungVanDuy Mar 4, 2023

jon-tow Mar 2, 2023

jon-tow left a comment

PhungVanDuy commented Mar 6, 2023

Support ILQL for T5 model, Fix PPO T5 for refactored code #290

Support ILQL for T5 model, Fix PPO T5 for refactored code #290

Conversation

PhungVanDuy commented Feb 8, 2023 • edited Loading

maxreciprocate left a comment

Choose a reason for hiding this comment

jon-tow left a comment

Choose a reason for hiding this comment

jon-tow Mar 2, 2023

Choose a reason for hiding this comment

PhungVanDuy Mar 4, 2023

Choose a reason for hiding this comment

jon-tow Mar 2, 2023

Choose a reason for hiding this comment

jon-tow left a comment

Choose a reason for hiding this comment

PhungVanDuy commented Mar 6, 2023

PhungVanDuy commented Feb 8, 2023 •

edited

Loading