Getting rewards always 1 computed in utils/logits_processor.py file #4

RadiantCrystal · 2024-09-12T18:27:04Z

Hi Authors,

I am reimplementing your code for my project work. I was going through your codebase and algorithm. Could you please refer me to the code where you implemented the line number 6 in Algorithm?
**Actually I am always getting all 1 rewards for my experiments. Seems like I am missing something. **
6: pt ← softmax(zt + βρt ) // compute reweighted distribution

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting rewards always 1 computed in utils/logits_processor.py file #4

Getting rewards always 1 computed in utils/logits_processor.py file #4

RadiantCrystal commented Sep 12, 2024 •

edited

Loading

Getting rewards always 1 computed in utils/logits_processor.py file #4

Getting rewards always 1 computed in utils/logits_processor.py file #4

Comments

RadiantCrystal commented Sep 12, 2024 • edited Loading

RadiantCrystal commented Sep 12, 2024 •

edited

Loading