GitHub - bradleybauer/othello: A place to share my experiments with reinforcement learning in othello

#TODO

Policy & Value network shared backbone

Opponent Sampling: During experience generation, opponents are sampled with probabilities inversely proportional to the current policy's performance against them. In other words, opponents against whom the policy has a lower win rate are chosen more frequently, allowing the policy to focus on its weaknesses.

The top plot shows the current win rate for each opponent, while the bottom plot displays the sampling weight assigned to each opponent.

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
.gitignore		.gitignore
README.md		README.md
checkpoint.pth		checkpoint.pth
compare_best_policies_between_ckpts.py		compare_best_policies_between_ckpts.py
compare_policies_against_baseline.py		compare_policies_against_baseline.py
export.py		export.py
othello.py		othello.py
othello_env.py		othello_env.py
policy_function.py		policy_function.py
sampling.png		sampling.png
sampling2.png		sampling2.png
simulation_test.py		simulation_test.py
train_ppo.py		train_ppo.py
train_vpg.py		train_vpg.py
value_function.py		value_function.py
vis_winrates.py		vis_winrates.py
winrate.txt		winrate.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

bradleybauer/othello

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages