Escher: Self-Evolving Visual Concept Library using Vision-Language Critics

Prior work in concept-bottleneck visual recognition aims to leverage discriminative visual concepts to enable more accurate object classification. Escher is an approach for iteratively evolving a visual concept library using feedback from a VLM critic to discover descriptive visual concepts.

More details can be found on the project page.

Getting started

With Escher:

$ conda create --name escher-dev python=3.12
$ conda activate escher-dev
$ pip install -e . # Installs escher in editable mode
$ vim escher/cbd_utils/server.py
# Edit this file to point to the correct GPT model, API key location, etc.
$ CUDA_VISIBLE_DEVICES=4 python escher/iteration.py --dataset cub --topk 50 --prompt_type confound_w_descriptors_with_conversational_history --distance_type confusion --subselect -1 --decay_factor 10 --classwise_topk 10 --num_iters 100 --perc_labels 0.0 --perc_initial_descriptors 1.00 --algorithm lm4cv --salt "1.debug"
# ^ This runs with openai-gpt-3.5-turbo, no vllm instance required.
$ CUDA_VISIBLE_DEVICES=1 python escher/iteration.py --dataset cub --topk 50 --openai_model LOCAL:meta-llama/Llama-3.1-8B-Instruct --prompt_type confound_w_descriptors_with_conversational_history --distance_type confusion --subselect -1 --decay_factor 10 --classwise_topk 10 --num_iters 100 --perc_labels 0.0 --perc_initial_descriptors 1.00 --algorithm lm4cv --salt "1.debug"
# ^ Same command but the LOCAL: prefix makes it use the vllm_client defined in `serve.py` and calls the LLama-3.1-8B-Instruct model.

# To run on multiple datasets, read cmds.sh

With vLLM (if using local LLM):

$ conda env create --name vllm python=3.9
$ conda activate vllm
$ pip install vllm
$ CUDA_VISIBLE_DEVICES=0,1,2,3 vllm meta-llama/Llama-3.1-8B-Instruct --api-key token-abc123 --port 11440 --tensor-parallel-size 4 --disable-log-requests
# ^ This is a debugging server. Running with meta-llama/Llama-3.3-70B-Instruct is recommended.
$ CUDA_VISIBLE_DEVICES=0,1,2,3 vllm serve meta-llama/Llama-3.3-70B-Instruct  --api-key token-abc123 --port 11440 --tensor-parallel-size 4 --disable-log-requests
# ^ See if this fits. I wasn't able to get it work and had to use a quantized model (which also work well).

General Structure

escher/
- iteration.py: Main file to run the self-evolving process. This parses the command line arguments, loads the initial set of concepts defined in descriptors/cbd_descriptors/descriptors_{dataset}.json, and instantiates the model.
- library/
  - library.py: A wrapper around the library of concepts. This class is responsible for loading the concepts, updating the concepts, and saving the concepts.
  - history_conditioned_library.py: An extension of library.py that uses the history of the concepts to condition the library generation.
- models/
  - model.py: The abstract class for a concept-bottleneck based model.
  - model_zero_shot.py: Implementation of the model which does not train any parameters.
  - model_lm4cv.py: Implementation of the model which trains the parameters of the model.
- utils/dataset_loader.py: Main entry point for loading the dataset.
- cbd_utils: Many utility functions for GPT calling / training / evaluation / caching useful for implementing a "zeroshot" Escher model.
- lm4cv: Many utility functions for implementing a "lm4cv" Escher model.
descriptors/: Contains the initial descriptors for each dataset.
cmds.sh: A set of hacky scripts to run escher on multiple datasets.
cache/: This is a cache for the image/text embeddings for all the datasets. This file is too big to keep on GitHub, but a zipped version of this folder is available on this google drive link: Link or if you email me at atharvas@utexas.edu. I hope this eases some of the pain of setting up the datasets, but I'm not equipped to answer general questions about dataset setup unfortunately.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
escher.egg-info		escher.egg-info
escher		escher
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cmds.sh		cmds.sh
pyproject.toml		pyproject.toml
sandbox.ipynb		sandbox.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Escher: Self-Evolving Visual Concept Library using Vision-Language Critics

Getting started

General Structure

About

Releases

Packages

Languages

License

trishullab/escher

Folders and files

Latest commit

History

Repository files navigation

Escher: Self-Evolving Visual Concept Library using Vision-Language Critics

Getting started

General Structure

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages