Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting rewards always 1 computed in utils/logits_processor.py file #4

Open
RadiantCrystal opened this issue Sep 12, 2024 · 0 comments

Comments

@RadiantCrystal
Copy link

RadiantCrystal commented Sep 12, 2024

Hi Authors,

I am reimplementing your code for my project work. I was going through your codebase and algorithm. Could you please refer me to the code where you implemented the line number 6 in Algorithm?
**Actually I am always getting all 1 rewards for my experiments. Seems like I am missing something. **
6: pt ← softmax(zt + βρt ) // compute reweighted distribution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant