R2D2 #34

garymm · 2025-03-03T05:34:57Z

R2D2 agent and the necessary changes to Earl to implement it.

Change GymnasiumLoop to take an env factory. The assumption that we could copy.deepcopy() was not a good one.
Support envpool in GymnasiumLoop.
Support updating experience state in Agent.loss().
Bug fixes in GymnasiumLoop.

I think the key is doing a reversed loop for the n step returns rather than a vmap

to support setting priorities for replay

currently crashes with regular python, probably due to buffer donation bug, but runs with debug python not sure if it's working yet

does not appear to be learning cartpole, so something is wrong

add more tests test_r2d2_learns_cartpole currently fails

before it seems it was a no-op for some reason having to do with vmap

this matches the implementation in Acme

add learning rate schedule make number of optimizations per cycle configurable log hyperparameters

Main thing changed was shrinking replay buffer

The actor state being returned was sharded for a single learner device

shrink atari game for TPU memory savings

codecov-commenter · 2025-03-03T06:33:00Z

Codecov Report

Attention: Patch coverage is 86.20690% with 88 lines in your changes missing coverage. Please review.

Project coverage is 93.08%. Comparing base (15dbe65) to head (a18c16f).
Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
earl/agents/r2d2/utils.py	63.11%	45 Missing ⚠️
earl/agents/r2d2/r2d2.py	87.97%	35 Missing ⚠️
earl/agents/r2d2/networks.py	96.15%	6 Missing ⚠️
earl/environment_loop/gymnasium_loop.py	94.59%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master      #34      +/-   ##
==========================================
- Coverage   96.74%   93.08%   -3.67%     
==========================================
  Files          13       16       +3     
  Lines        1168     1750     +582     
==========================================
+ Hits         1130     1629     +499     
- Misses         38      121      +83

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

garymm added 30 commits February 19, 2025 22:56

R2D2 intial commit, not tested

0543ae4

more WIP. Trying to get Atari frames in

1055fbe

atari input at least doesn't crash

f9f60f3

add test for atari input and fix q value size

8ea67e1

fix static fields in r2d2 networks

a236ce0

gymnasium_loop: fix mixed up argument order

21327c0

r2d2 WIP

260cbf9

r2d2: some fixes. still not done

ab84e4a

r2d2: add value rescaling

844b796

port code over from acme/agents/jax/r2d2

940d216

I think the key is doing a reversed loop for the n step returns rather than a vmap

update experience state in loss

66507b3

to support setting priorities for replay

gymnasium_loop: fix bug. only copy one net replica

afb5a99

sharding: fix docstring

1ba65bc

r2d2: runs for 2 cycles!

9a892ac

change grad means metric names to make Mlflow happy

5647447

scripts to train r2d2 on atari

ad743ba

currently crashes with regular python, probably due to buffer donation bug, but runs with debug python not sure if it's working yet

gymnasium_loop: fix buffer donation bug

db4d7c5

r2d2: experiment runner files

40749b2

does not appear to be learning cartpole, so something is wrong

fix bug in _sample_from_experience

7a82127

add more tests test_r2d2_learns_cartpole currently fails

use better import path

6489689

fix buffer update

18bd558

before it seems it was a no-op for some reason having to do with vmap

minor cleanup

d444d6b

r2d2: epslion-greedy and remove incremental updates

7ab4825

this matches the implementation in Acme

suppress warnings

961cb01

epsilon greedy schedule and more debugging

9e2525b

make lstm optional

a48c3cb

add learning rate schedule make number of optimizations per cycle configurable log hyperparameters

add TODO about stop grad after burn in

d6d001f

use optax.incremental_update

76e6f2f

r2d2: tests passing!

ac6bcde

Main thing changed was shrinking replay buffer

set adam eps value to what it was in r2d2 paper

68f52d8

garymm added 25 commits March 1, 2025 05:06

fix shard agent state

219916e

add render atari observe cycle

80381fe

WIP: asterix atari

f750cb0

run_atari: assert num envs per learner even

0f837eb

r2d2: support replaying larger batches

1981640

ignore warning triggered by envpool

8336410

test_r2d2: use env_factory

5e6f4f2

fix metrickey import

b881812

restore test_learns_cartpole

a8d2c08

set priority to 1 for new experience

db2ac3c

double replay batch size

2faf2b0

improve error message

84b81ef

fix dtype support in resnet

0e12c9c

gymnasium_loop: fix bug when len(learner_devices) > 1

cb5ceca

The actor state being returned was sharded for a single learner device

run minasterix longer

f0258a7

shrink atari game for TPU memory savings

vs code setting: ignore git limit warning

5b519b4

start to fix bazel test

14ce38d

fix gymnasium tests

727603f

fix run_experiment for env_factory

152145c

suppress false pyright error

a8fb7d7

rename and delete runners

f3a52c3

fix some broken stuff

4ad97b3

set long timeout for slow github runner

587a9df

shard test_run_experiment

9781362

split learns cartpole to separate test

36891f1

garymm enabled auto-merge (squash) March 3, 2025 06:14

shorten test

a18c16f

garymm merged commit 9fb5720 into master Mar 3, 2025
4 checks passed

garymm deleted the r2d2 branch March 3, 2025 06:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

R2D2 #34

R2D2 #34

garymm commented Mar 3, 2025 •

edited

Loading

codecov-commenter commented Mar 3, 2025

R2D2 #34

R2D2 #34

Conversation

garymm commented Mar 3, 2025 • edited Loading

codecov-commenter commented Mar 3, 2025

Codecov Report

garymm commented Mar 3, 2025 •

edited

Loading