About Multi-image preprocessing, such as dino, voxelize and 3d vae #177

LeeCASC · 2025-02-10T02:56:28Z

I want to confirm the experimental process.
First, input 150 images into the Dinov2 model to obtain the feature (150, 1024, 37, 37).
Then, by voxelization, project it to the plane and then project it back to the voxel space and then average each voxel, will the feature (1024, 64, 64, 64) be obtained? Will averaging reduce the number of channels(1024)? How to filter the obtained features from 64x64x64=262144 to 20,000?
Finally, the pre-trained model 3D VAE is used to process the feature. What is this pre-trained model? What data is used to pretrain 3D VAE?

JeffreyXiang · 2025-02-12T06:29:48Z

Hi, we provide dataset_toolkits that contains the process of dino feature fusion. VAE is also trained with 500K assets we filtered

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About Multi-image preprocessing, such as dino, voxelize and 3d vae #177

About Multi-image preprocessing, such as dino, voxelize and 3d vae #177

LeeCASC commented Feb 10, 2025

JeffreyXiang commented Feb 12, 2025

About Multi-image preprocessing, such as dino, voxelize and 3d vae #177

About Multi-image preprocessing, such as dino, voxelize and 3d vae #177

Comments

LeeCASC commented Feb 10, 2025

JeffreyXiang commented Feb 12, 2025