Skip to content

Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection

License

Notifications You must be signed in to change notification settings

Xuan-World/Mamba-YOLO-World

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mamba-YOLO-World

Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection

Haoxuan Wang, Qingdong He, Jinlong Peng, Hao Yang, Mingmin Chi, Yabiao Wang

arXiv

✨ News

🎉 accepted by ICASSP 2025 (Oral)

2024-10-30: 🤗 We provide the Model Weights and Visualization Results on HuggingFace.

2024-09-24: 🚀 We provide all the Model Weights for community.

2024-09-14: 💎 We provide the Mamba-YOLO-World source code for community.

2024-09-12: We provide the Visualization Results of ZERO-SHOT Inference on LVIS generated by Mamba-YOLO-World and YOLO-World for comparison.

Introduction

This repo contains the PyTorch implementation, pre-trained weights, and pre-training/fine-tuning code for Mamba-YOLO-World.

  • We present Mamba-YOLO-World, a novel YOLO-based OVD model employing the proposed MambaFusion-PAN as its neck architecture.

  • We introduce a State Space Model-based feature fusion mechanism consisting of a Parallel-Guided Selective Scan algorithm and a Serial-Guided Selective Scan algorithm, with O(N+1) complexity and globally guided receptive fields.

  • Experiments demonstrate that our model outperforms the original YOLO-World while maintaining comparable parameters and FLOPs. Additionally, it surpasses existing state-of-the-art OVD methods with fewer parameters and FLOPs.

📷 Visualization Results

Model Zoo

Zero-shot Evaluation on LVIS-minival dataset

Zero-shot Evaluation on COCO dataset

Fine-tuning Evaluation on COCO dataset

Getting started

1. Installation

Mamba-YOLO-World is developed based on torch==2.0.0,mamba-ssm==2.1.0, triton==2.1.0,supervision==0.20.0, mmcv==2.0.1, mmyolo==0.6.0 and mmdetection==3.3.0.

You need to link the mmyolo under third_party directory.

2. Preparing Data

We provide the details about the pre-training data in docs/data.

Evaluation

./tools/dist_test.sh configs/mamba2_yolo_world_s.py  CHECKPOINT_FILEPATH  num_gpus_per_node
./tools/dist_test.sh configs/mamba2_yolo_world_m.py  CHECKPOINT_FILEPATH  num_gpus_per_node
./tools/dist_test.sh configs/mamba2_yolo_world_l.py  CHECKPOINT_FILEPATH  num_gpus_per_node

Pre-training

./tools/dist_train.sh configs/mamba2_yolo_world_s.py  num_gpus_per_node  --amp
./tools/dist_train.sh configs/mamba2_yolo_world_m.py  num_gpus_per_node  --amp
./tools/dist_train.sh configs/mamba2_yolo_world_l.py  num_gpus_per_node  --amp

Fine-tuning

./tools/dist_train.sh configs/mamba2_yolo_world_s_mask-refine_finetune_coco.py  num_gpus_per_node --amp 
./tools/dist_train.sh configs/mamba2_yolo_world_m_mask-refine_finetune_coco.py  num_gpus_per_node --amp 
./tools/dist_train.sh configs/mamba2_yolo_world_l_mask-refine_finetune_coco.py  num_gpus_per_node --amp 

Demo

  • image_demo.py: inference with images or a directory of images
  • video_demo.py: inference on videos.

Acknowledgement

We sincerely thank mmyolo, mmdetection, YOLO-World, Mamba and VMamba for providing their wonderful code to the community!

Citation

@inproceedings{wang2025mamba,
  title={Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection},
  author={Wang, Haoxuan and He, Qingdong and Peng, Jinlong and Yang, Hao and Chi, Mingmin and Wang, Yabiao},
  booktitle={ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={1--5},
  year={2025},
  organization={IEEE}
}

About

Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published