Skip to content

MMKE-Bench-ICLR/MMKE-Bench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

arXiv Dataset paperwithcode code website Slides PDF airchina codeproject actix zhihu

Table of Contents

πŸ”” News

  • [2024.10.30] The Project Page is already on display! Project Page.

  • [2024.10.25] Code is available now!

  • [2024.10.25] We release the MMKE-Bench dataset at πŸ€— Huggingface Dataset.

  • [2023.10.25] We hung the paper on πŸ€— Huggingface Papers.

🌟Overview

TL;DR: We propose MMKE-Bench, a challenging benchmark for evaluating diverse semantic editing in real-world scenarios.

Overview of the MMKE-Bench dataset. Our contribution can be summarized as follows:

1) Overview of MMKE-Bench: MMKE-Bench is introduced as a benchmark designed to test semantic editing capabilities in realistic scenarios. It utilizes natural language for knowledge representation and includes three editing types aligned with practical contexts.

2) Development of the Benchmark Pipeline: Describes the novel pipeline used to develop the benchmark, which includes collecting original knowledge, generating editable knowledge, and crafting evaluation questions based on specific principles.

3) Experimental Analysis and Challenges: Details extensive experiments with various standard methods and large language models, highlighting several limitations in the existing approaches to knowledge editing in both single and multiple edit scenarios.

πŸ€— Dataset

We introduce MMKE-Bench, a benchmark designed to evaluate the ability of LMMs to edit visual knowledge in real-world scenarios. MMKE-Bench incorporates three editing tasks: visual entity editing, visual semantic editing, and user-specific editing. Additionally, it uses free-form natural language to represent and edit knowledge, offering more flexibility. The benchmark includes 2,940 pieces of knowledge and 7,229 images across 110 fine-grained types, with automatically generated, human-verified evaluation questions.

You can download MMKE-Bench data πŸ€— Huggingface Dataset. And the expected structure of files is:

MMKE-Bench
|-- data_json
|   |-- entity
|   |   |-- train.json
|   |   |-- eval.json
|   |-- visual
|   |   |-- train.json
|   |   |-- eval.json
|   |-- user
|   |   |-- train.json
|   |   |-- eval.json
|-- data_image
|   |-- entity
|   |   |-- image.....
|   |-- visual
|   |   |-- image.....
|   |-- user
|   |   |-- image.....

πŸ› οΈ Requirements and Installation

# clone MMKE-Bench
git clone https://github.com/mmke-bench-bigai/mmke-bench.git

cd mmke-bench

# create conda env
#for blip2 llava minigpt4 in FT IKE MEND SERAC 
conda env create -f envs/mmke.yml

#for blip2 llava minigpt4 in KE
conda env create -f envs/mmke-ke.yml

πŸ’₯Training

For FT-LLM:

python multimodal_edit.py --function_name=test_FT_LLaVA --hop=1 --data_type=entity

function_name in ['test_FT_LLaVA','test_FT_MiniGPT4','test_FT_Blip2OPT']
data_type in ['entity','visual','user']

For FT-Alignment:

python multimodal_edit.py --function_name=test_FT_LLaVA_mmproj --hop=1 --data_type=entity

function_name in ['test_FT_LLaVA_mmproj','test_FT_MiniGPT4_Qformer','test_FT_Blip2OPT_QFormer']
data_type in ['entity','visual','user']

For SERAC:

python multimodal_edit.py --function_name=train_SERAC_LLaVA --hop=1 --data_type=entity

function_name in ['train_SERAC_LLaVA','train_SERAC_MiniGPT4','train_SERAC_Blip2OPT']
data_type in ['entity','visual','user']

For MEND:

python multimodal_edit.py --function_name=train_MEND_LLaVA --hop=1 --data_type=entity

function_name in ['train_MEND_LLaVA','train_MEND_MiniGPT4','train_MEND_Blip2OPT']
data_type in ['entity','visual','user']

For IKE:

python multimodal_edit.py --function_name=test_IKE_LLaVA --hop=1 --data_type=entity

function_name in ['test_IKE_LLaVA','test_IKE_MiniGPT4','test_IKE_Blip2OPT']
data_type in ['entity','visual','user']

For KE:

bash KE/train_ke.sh 0 llava entity
bash KE/train_ke.sh 0 minigpt4 entity
bash KE/train_ke.sh 0 blip2 entity

model_name in ['llava','minigpt4','blip2']
data_type in ['entity','visual','user']

bash KE/test_multihop.sh 0 llava 1 entity
bash KE/test_multihop.sh 0 minigpt4 1 entity
bash KE/test_multihop.sh 0 blip2 1 entity

Editing GPU memory usage 【memory/res_max_val】

Entity BLIP2-OPT LLaVA-1.5 MiniGPT-4
FT-LLM 21GB 35GB 45GB
FT-Alignment 24GB 40GB 50GB
SERAC 17GB 80GB 67GB
IKE 14GB 50GB 26GB
MEND 23GB 61GB 43GB
Visual BLIP2-OPT LLaVA-1.5 MiniGPT-4
FT-LLM 21GB 35GB 45GB
FT-Alignment 23GB 39GB 48GB
SERAC 16GB 73GB 58GB
IKE 15GB 25GB 25GB
MEND 21GB 55GB 40GB
User BLIP2-OPT LLaVA-1.5 MiniGPT-4
FT-LLM 21GB 35GB 45GB
FT-Alignment 23GB 38GB 48GB
SERAC 15GB 71GB 56GB
IKE 23GB 30GB 28GB
MEND 21GB 54GB 39GB

Here we put under 'hugging_cache' folder and 'openai' folder:

# models in hugging_cache folder
hugging_cache/
β”œβ”€β”€ all-MiniLM-L6-v2/
β”œβ”€β”€ bert-base-uncased/
β”œβ”€β”€ distilbert-base-cased/
β”œβ”€β”€ Llama-2-7b-hf/
β”œβ”€β”€ llava-v1.5-7b/
β”œβ”€β”€ mplug-owl2-llama2-7b/
β”œβ”€β”€ opt-2.7b/
β”œβ”€β”€ opt-125m/
β”œβ”€β”€ vicuna-7b/
β”œβ”€β”€ vicuna-7b-v1.5/
β”‚   
β”œβ”€β”€ blip2_pretrained_flant5xxl.pth
β”œβ”€β”€ blip2_pretrained_opt2.7b.pth
β”œβ”€β”€ eva_vit_g.pth
└── pretrained_minigpt4_7b.pth

# clip-vit model in openai folder
openai/
└── clip-vit-large-patch14-336/

Links are in the following:

all-MiniLM-L6-v2 bert-base-uncased distilbert-base-cased
llava-v1.5-7b opt-2.7b opt-125m
vicuna-7b vicuna-7b-v1.5 Llama-2-7b-hf
mplug-owl2-llama2-7b blip2_pretrained_flant5xxl.pth blip2_pretrained_opt2.7b.pth
prerained_minigpt4_7b.pth eva_vit_g.pth clip-vit-large-patch14-336

(back to top)

✏️ Citation

If you find our paper and code useful in your research, please consider giving a star ⭐ and citation πŸ“.

@article{du2024mmke_bench,
            title = {MMKE-Bench: A Multimodal Editing Benchmark for Diverse Visual Knowledge},
            author = {Yuntao Du and Kailin Jiang and Zhi Gao and Chenrui Shi and Zilong Zheng and Siyuan Qi and Qing Li},
            year = {2024}
          }

⭐ Star History

Star History Chart

πŸŽ‰Contributors

About

MMKE-Bench, a challenging benchmark for evaluating diverse semantic editing in real-world scenarios.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published