Skip to content

A place to share my experiments with reinforcement learning in othello

Notifications You must be signed in to change notification settings

bradleybauer/othello

Repository files navigation

#TODO

  • Policy & Value network shared backbone

Opponent Sampling: During experience generation, opponents are sampled with probabilities inversely proportional to the current policy's performance against them. In other words, opponents against whom the policy has a lower win rate are chosen more frequently, allowing the policy to focus on its weaknesses.

sampling

The top plot shows the current win rate for each opponent, while the bottom plot displays the sampling weight assigned to each opponent.

About

A place to share my experiments with reinforcement learning in othello

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages