SimbaV2 is a reinforcement learning architecture designed to stabilize training via hyperspherical normalization. By increasing model capacity and compute, SimbaV2 achieves state-of-the-art results on 57 continuous control tasks from MuJoCo, DMControl, MyoSuite, and Humanoid-bench.
Hojoon Lee* Youngdo Lee* Takuma Seno Donghu Kim Peter Stone Jaegul Choo
We compare SimbaV2 to the original Simba by tracking:
- (a) Average normalized return across tasks.
- (b) Weighted sum of
$\ell_2$ -norms of all intermediate features in critics. - (c) Weighted sum of
$\ell_2$ -norms of all critic parameters. - (d) Weighted sum of
$\ell_2$ -norms of all gradients in critics. - (e) Effective learning rate (ELR) of the critics.
SimbaV2 consistently maintains stable norms and ELR, while Simba shows divergent fluctuations.
We scale model parameters by increasing critic width and scale compute via the update-to-data (UTD) ratio. We also explore resetting vs. non-resetting training:
- DMC-Hard (7 tasks):
$\texttt{dog}$ and$\texttt{humanoid}$ embodiments. - HBench-Hard (5 tasks):
$\texttt{run}$ ,$\texttt{balance-simple}$ ,$\texttt{sit-hard}$ ,$\texttt{stair}$ ,$\texttt{walk}$ .
On these challenging subsets, SimbaV2 benefits from increasing model size and UTD, while Simba plateaus. Notably, SimbaV2 scales smoothly with UTD even without resets, and resetting can degrade its performance.
SimbaV2 outperforms competing RL algorithms, with performance improving as compute increases.
We use Gymnasium 1.0 API interface which provides seamless integration with diverse RL environments.
We provide a Dockerfile
for easy installation. You can build the docker image by running.
docker build . -t scale_rl .
docker run --gpus all -v .:/home/user/scale_rl -it scale_rl /bin/bash
If you prefer to install dependencies manually, start by installing dependencies via conda by following the guidelines.
# Use pip
pip install -r deps/requirements.txt
# Or use conda
conda env create -f deps/environment.yaml
pip install -U "jax[cuda12]==0.4.25" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
# If you want to execute multiple runs with a single GPU, we recommend to set this variable.
export XLA_PYTHON_CLIENT_PREALLOCATE=false
Please see installation instruction at MuJoCo.
# Additional environmental evariables for headless rendering
export MUJOCO_GL="egl"
export MUJOCO_EGL_DEVICE_ID="0"
export MKL_SERVICE_FORCE_INTEL="0"
git clone https://github.com/joonleesky/humanoid-bench
cd humanoid-bench
pip install -e .
git clone --recursive https://github.com/joonleesky/myosuite
cd myosuite
pip install -e .
We provide examples on how to train SAC agents with SimBa architecture.
To run a single online RL experiment
python run_online.py
To run a single offline RL experiment
python run_offline.py
To benchmark the algorithm with all environments
python run_parallel.py \
--task all \
--device_ids <list of gpu devices to use> \
--num_seeds <num_seeds> \
--num_exp_per_device <number>
Please refer to /analysis
to visualize the experimental results provided in the paper.
This project is released under the Apache 2.0 license.
If you find our work useful, please consider citing our paper as follows:
@article{lee2025hyperspherical,
title={Hyperspherical Normalization for Scalable Deep Reinforcement Learning},
author={Lee, Hojoon and Lee, Youngdo and Seno, Takuma and Kim, Donghu and Stone, Peter and Choo, Jaegul},
journal={arXiv preprint arXiv:2502.15280},
year={2025}
}