- Table of Contents
- π News
- π Overview
- π€ Dataset
- π οΈ Requirements and Installation
- π₯ Training
- βοΈ Citation
- β Star History
- πContributors
-
[2024.10.30] The Project Page is already on displayοΌ Project Page.
-
[2024.10.25] Code is available now!
-
[2024.10.25] We release the MMKE-Bench dataset at π€ Huggingface Dataset.
-
[2023.10.25] We hung the paper on π€ Huggingface Papers.
TL;DR: We propose MMKE-Bench, a challenging benchmark for evaluating diverse semantic editing in real-world scenarios.
Overview of the MMKE-Bench dataset. Our contribution can be summarized as follows:
1) Overview of MMKE-Bench: MMKE-Bench is introduced as a benchmark designed to test semantic editing capabilities in realistic scenarios. It utilizes natural language for knowledge representation and includes three editing types aligned with practical contexts.
2) Development of the Benchmark Pipeline: Describes the novel pipeline used to develop the benchmark, which includes collecting original knowledge, generating editable knowledge, and crafting evaluation questions based on specific principles.
3) Experimental Analysis and Challenges: Details extensive experiments with various standard methods and large language models, highlighting several limitations in the existing approaches to knowledge editing in both single and multiple edit scenarios.
We introduce MMKE-Bench, a benchmark designed to evaluate the ability of LMMs to edit visual knowledge in real-world scenarios. MMKE-Bench incorporates three editing tasks: visual entity editing, visual semantic editing, and user-specific editing. Additionally, it uses free-form natural language to represent and edit knowledge, offering more flexibility. The benchmark includes 2,940 pieces of knowledge and 7,229 images across 110 fine-grained types, with automatically generated, human-verified evaluation questions.
You can download MMKE-Bench data π€ Huggingface Dataset. And the expected structure of files is:
MMKE-Bench
|-- data_json
| |-- entity
| | |-- train.json
| | |-- eval.json
| |-- visual
| | |-- train.json
| | |-- eval.json
| |-- user
| | |-- train.json
| | |-- eval.json
|-- data_image
| |-- entity
| | |-- image.....
| |-- visual
| | |-- image.....
| |-- user
| | |-- image.....
# clone MMKE-Bench
git clone https://github.com/mmke-bench-bigai/mmke-bench.git
cd mmke-bench
# create conda env
#for blip2 llava minigpt4 in FT IKE MEND SERAC
conda env create -f envs/mmke.yml
#for blip2 llava minigpt4 in KE
conda env create -f envs/mmke-ke.yml
For FT-LLM:
python multimodal_edit.py --function_name=test_FT_LLaVA --hop=1 --data_type=entity
function_name in ['test_FT_LLaVA','test_FT_MiniGPT4','test_FT_Blip2OPT']
data_type in ['entity','visual','user']
For FT-Alignment:
python multimodal_edit.py --function_name=test_FT_LLaVA_mmproj --hop=1 --data_type=entity
function_name in ['test_FT_LLaVA_mmproj','test_FT_MiniGPT4_Qformer','test_FT_Blip2OPT_QFormer']
data_type in ['entity','visual','user']
For SERAC:
python multimodal_edit.py --function_name=train_SERAC_LLaVA --hop=1 --data_type=entity
function_name in ['train_SERAC_LLaVA','train_SERAC_MiniGPT4','train_SERAC_Blip2OPT']
data_type in ['entity','visual','user']
For MEND:
python multimodal_edit.py --function_name=train_MEND_LLaVA --hop=1 --data_type=entity
function_name in ['train_MEND_LLaVA','train_MEND_MiniGPT4','train_MEND_Blip2OPT']
data_type in ['entity','visual','user']
For IKE:
python multimodal_edit.py --function_name=test_IKE_LLaVA --hop=1 --data_type=entity
function_name in ['test_IKE_LLaVA','test_IKE_MiniGPT4','test_IKE_Blip2OPT']
data_type in ['entity','visual','user']
For KE:
bash KE/train_ke.sh 0 llava entity
bash KE/train_ke.sh 0 minigpt4 entity
bash KE/train_ke.sh 0 blip2 entity
model_name in ['llava','minigpt4','blip2']
data_type in ['entity','visual','user']
bash KE/test_multihop.sh 0 llava 1 entity
bash KE/test_multihop.sh 0 minigpt4 1 entity
bash KE/test_multihop.sh 0 blip2 1 entity
Editing GPU memory usage γmemory/res_max_valγ
Entity | BLIP2-OPT | LLaVA-1.5 | MiniGPT-4 |
---|---|---|---|
FT-LLM | 21GB | 35GB | 45GB |
FT-Alignment | 24GB | 40GB | 50GB |
SERAC | 17GB | 80GB | 67GB |
IKE | 14GB | 50GB | 26GB |
MEND | 23GB | 61GB | 43GB |
Visual | BLIP2-OPT | LLaVA-1.5 | MiniGPT-4 |
---|---|---|---|
FT-LLM | 21GB | 35GB | 45GB |
FT-Alignment | 23GB | 39GB | 48GB |
SERAC | 16GB | 73GB | 58GB |
IKE | 15GB | 25GB | 25GB |
MEND | 21GB | 55GB | 40GB |
User | BLIP2-OPT | LLaVA-1.5 | MiniGPT-4 |
---|---|---|---|
FT-LLM | 21GB | 35GB | 45GB |
FT-Alignment | 23GB | 38GB | 48GB |
SERAC | 15GB | 71GB | 56GB |
IKE | 23GB | 30GB | 28GB |
MEND | 21GB | 54GB | 39GB |
Here we put under 'hugging_cache' folder and 'openai' folder:
# models in hugging_cache folder
hugging_cache/
βββ all-MiniLM-L6-v2/
βββ bert-base-uncased/
βββ distilbert-base-cased/
βββ Llama-2-7b-hf/
βββ llava-v1.5-7b/
βββ mplug-owl2-llama2-7b/
βββ opt-2.7b/
βββ opt-125m/
βββ vicuna-7b/
βββ vicuna-7b-v1.5/
β
βββ blip2_pretrained_flant5xxl.pth
βββ blip2_pretrained_opt2.7b.pth
βββ eva_vit_g.pth
βββ pretrained_minigpt4_7b.pth
# clip-vit model in openai folder
openai/
βββ clip-vit-large-patch14-336/
Links are in the following:
If you find our paper and code useful in your research, please consider giving a star β and citation π.
@article{du2024mmke_bench,
title = {MMKE-Bench: A Multimodal Editing Benchmark for Diverse Visual Knowledge},
author = {Yuntao Du and Kailin Jiang and Zhi Gao and Chenrui Shi and Zilong Zheng and Siyuan Qi and Qing Li},
year = {2024}
}