SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE

Yongwei Chen¹ • Yushi Lan¹ • Shangchen Zhou¹ • Tengfei Wang² • Xingang Pan¹

¹S-lab, Nanyang Technological University
²Shanghai Artificial Intelligence Laboratory

CVPR 2025

sar3d.mp4

🌟 Features

🔄 Autoregressive Modeling
⚡️ Ultra-fast 3D Generation (<1s)
🔍 Detailed Understanding

🛠️ Installation & Usage

Prerequisites

We've tested SAR3D on the following environments:

Rocky Linux 8.10 (Green Obsidian)

Python 3.9.8
PyTorch 2.2.2
CUDA 12.1
NVIDIA H200

Ubuntu 20.04

Python 3.9.16
PyTorch 2.0.0
CUDA 11.7
NVIDIA A6000

Quick Start

Clone the repository

git clone https://github.com/cyw-3d/SAR3D.git
cd SAR3D

Set up environment

conda env create -f environment.yml

Download pretrained models 📥

The pretrained models will be automatically downloaded to the checkpoints folder during first run.

You can also manually download them from our model zoo:

Model	Description	Link
VQVAE	Base VQVAE model	vqvae-ckpt.pt
Generation	Image-conditioned model	image-condition-ckpt.pth
Generation	Text-conditioned model	text-condition-ckpt.pth

Run inference 🚀

To test the model on your own images:

Place your test images in the test_files/test_images folder
Run the inference script:

bash test_image.sh

To test the model on your own text prompts:

Place your test prompts in the test_files/test_text.json file
Run the inference script:

bash test_text.sh

📚 Training

Dataset

The dataset is available for download at Hugging Face.

The dataset consists of 8 splits containing preprocessed data based on G-buffer Objaverse, including:

Rendered images
Depth maps
Camera poses
Text descriptions
Normal maps
Latent embeddings

The dataset covers over 170K unique 3D objects, augmented to more than 630K data pairs. A data.json file is provided that maps object IDs to their corresponding categories.

After downloading and unzipping the dataset, you should have the following structure:

/dataset-root/
├── 1/
├── 2/
├── ...
├── 8/
│   └── 0/
│       ├── raw_image.png
│       ├── depth_alpha.jpg
│       ├── c.npy
│       ├── caption_3dtopia.txt
│       ├── normal.png
│       ├── ...
│       └── image_dino_embedding_lrm.npy
└── dataset.json

Training Commands

The following scripts allow you to train both image-conditioned and text-conditioned models using the dataset stored in the specified <DATA_DIR> location.

For image-conditioned model training:

bash train_image.sh <MODEL_DEPTH> <BATCH_SIZE> <GPU_NUM> <VQVAE_PATH> <OUT_DIR> <DATA_DIR>

For text-conditioned model training:

bash train_text.sh <MODEL_DEPTH> <BATCH_SIZE> <GPU_NUM> <VQVAE_PATH> <OUT_DIR> <DATA_DIR>

📋 Roadmap

Inference and Training Code for Image-conditioned Generation
Dataset Release
Inference Code for Text-conditioned Generation
Training Code for Text-conditioned Generation
VQVAE training code
Code for Understanding

📝 Citation

If you find this work useful for your research, please cite our paper:

@inproceedings{chen2024sar3d,
    title={SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE},
    author={Chen, Yongwei and Lan, Yushi and Zhou, Shangchen and Wang, Tengfei and Pan, Xingang},
    booktitle={CVPR},
    year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
Instantmesh		Instantmesh
datasets		datasets
dit		dit
dnnlib		dnnlib
files		files
guided_diffusion		guided_diffusion
ldm		ldm
models		models
nsr		nsr
test_files		test_files
utils		utils
vit		vit
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
test.py		test.py
test_image.sh		test_image.sh
test_text.sh		test_text.sh
train.py		train.py
train_image.sh		train_image.sh
train_text.sh		train_text.sh
trainer.py		trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE

🌟 Features

🛠️ Installation & Usage

Prerequisites

Quick Start

📚 Training

Dataset

Training Commands

📋 Roadmap

📝 Citation

About

Releases

Packages

Contributors 2

Languages

License

cyw-3d/SAR3D

Folders and files

Latest commit

History

Repository files navigation

SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE

🌟 Features

🛠️ Installation & Usage

Prerequisites

Quick Start

📚 Training

Dataset

Training Commands

📋 Roadmap

📝 Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages