Adding VLM pipeline #234

qcdipankar · 2025-01-21T14:24:57Z

We were able to create automodel for vlm specifically for phi3-vision

Supports available are

Export to onnx for 1 layer and full
generate
compile
inference on pytorch and run ai 100 is coming correct

TODO
Transformer model Clip needs to be set to eager manually for the test_vlm_model to run

Signed-off-by: Dipankar Sarkar <quic_dipankar@quicinc.com>

quic-akuruvil · 2025-01-21T14:44:50Z

QEfficient/utils/constants.py

-    SEQ_LEN = 32
-    CTX_LEN = 32
+    SEQ_LEN = 1024
+    CTX_LEN = 1280


Changing this here, will affect the causal_lm models, specify another set of constants for vlm models.

Since the above seq_len ctx_len is not generalized value, when considering other vlms, keeping it inside the specific function is better, I think.

resolved in new patch

quic-akuruvil · 2025-01-21T14:48:35Z

QEfficient/utils/constants.py

@@ -45,18 +45,18 @@ def get_models_dir():
 QEFF_MODELS_DIR = get_models_dir()

 ONNX_EXPORT_EXAMPLE_BATCH_SIZE = 1
-ONNX_EXPORT_EXAMPLE_SEQ_LEN = 32
+ONNX_EXPORT_EXAMPLE_SEQ_LEN = 1024


Same here, please verify and make sure the existing causalLM pipeline is not broken

resolved in new patch

quic-akuruvil · 2025-01-21T15:09:13Z

tests/transformers/models/test_vlm_models.py

Keep name test_image_text_to_text_models

resolved in new patch

quic-akuruvil · 2025-01-22T02:12:56Z

QEfficient/transformers/models/modeling_auto.py

+        #     raise TypeError("missing required argument: 'full_batch_size'")
+
+        # if kv_cache_batch_size and not full_batch_size:
+        #     raise ValueError(


avoid any commented lines in code

resolved in new patch

quic-rishinr · 2025-01-22T02:59:54Z

Please add documentation and an example script.

quic-amitraj

I observed code cleaning yet to be done. Please address all the comments and remove all the commented and unnecessary lines. Also update the docstring accordingly.

quic-amitraj · 2025-01-23T17:43:22Z

QEfficient/transformers/models/modeling_auto.py

+        from QEfficient import QEFFAutoModelForImageTextToText
+        from transformers import AutoTokenizer
+
+        model_name = "llava"


Update Docstrings.

quic-amitraj · 2025-01-23T17:47:47Z

QEfficient/transformers/models/modeling_auto.py

+            # warnings.warn(
+            #     "full_batch_size argument is deprecated. Use continuous_batching=True instead.", DeprecationWarning, 2
+            # )
+        # breakpoint()


Remove commented codes.

quic-amitraj · 2025-01-23T17:54:27Z

QEfficient/transformers/models/modeling_auto.py

+            from transformers import AutoTokenizer
+
+            # Initialize the model using from_pretrained similar to transformers.AutoModelForCausalLM
+            model_name = "gpt2"


Update here as well.

quic-akuruvil · 2025-01-24T06:07:31Z

Verify the PR on 4.46.0 transformers version

quic-rishinr · 2025-01-22T06:43:45Z

QEfficient/transformers/models/modeling_auto.py

@@ -24,6 +34,8 @@
 from QEfficient.transformers.quantizers.auto import QEFF_AUTO_QUANTIZATION_CONFIG_MAPPING, with_replaced_quantizers
 from QEfficient.transformers.quantizers.quant_transforms import AwqToMatmulNbitsTransform, GPTQToMatmulNbitsTransform
 from QEfficient.utils import constants, get_padding_shape_from_config
+
+# from QEfficient.transformers.models.phi3_vision.modeling_phi3_vision import Phi3VModelWrapper


Please remove unused imports

quic-rishinr · 2025-01-22T06:44:02Z

QEfficient/transformers/models/modeling_auto.py

@@ -421,6 +433,485 @@ def generate(
            raise NotImplementedError("Only AI_100 runtime is supported right now via generate API")


+class QEFFAutoModelForImageTextToText(QEFFTransformersBase):
+    """
+    The QEFF class is designed for manipulating any causal language model from the HuggingFace hub.


Please update the doc string. Doc string is referring to causal model

quic-rishinr · 2025-01-22T06:45:06Z

QEfficient/transformers/models/modeling_auto.py

+        from transformers import AutoTokenizer
+
+        model_name = "llava"
+        model = QEFFAutoModelForCausalLM.from_pretrained(model_name, num_hidden_layers=2)


Why is it using QEFFAutoModelForCausalLM? Can you update it with right scripts?

quic-rishinr · 2025-01-22T06:49:02Z

QEfficient/transformers/models/modeling_auto.py

+            # warnings.warn(
+            #     "full_batch_size argument is deprecated. Use continuous_batching=True instead.", DeprecationWarning, 2
+            # )
+        # breakpoint()


Please remove the debugging/commented lines

quic-rishinr · 2025-01-22T06:50:14Z

QEfficient/transformers/models/modeling_auto.py

+        self.continuous_batching = continuous_batching
+        self.is_tlm = is_tlm
+        self.pad_token_id = model.config.pad_token_id
+        self.ctx_len = 1280


Why Ctx len is hardcoded?

quic-rishinr · 2025-01-23T05:12:26Z

tests/transformers/models/test_vlm_models.py

+        :ctx_len (int): Maximum context length to compile the model.
+        :n_layers (int): Number of layers for the Model.
+    """
+    # replace_transformers_quantizers()


Please remove unwanted commented lines

quic-rishinr · 2025-01-23T05:12:55Z

tests/transformers/models/test_vlm_models.py

+    streamer = TextStreamer(processor)
+    # Testing for Phi-3.5 only atm
+    inputs = _generate_inputs(model_hf, processor)
+    breakpoint()


remove break point

quic-rishinr · 2025-01-23T05:13:15Z

tests/transformers/models/test_vlm_models.py

+        num_hidden_layers=n_layer,
+        _attn_implementation="eager",
+        trust_remote_code=True,
+        # Check if this works


please remove unwanted commented line from this method

quic-rishinr · 2025-01-23T05:13:53Z

tests/transformers/models/test_vlm_models.py

+    model_hf, _ = load_vlm_model(model_config)
+    # Load processor instead
+    processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
+    # config = model_hf.config


remove unwanted commented line

quic-rishinr · 2025-01-23T05:28:46Z

tests/transformers/models/test_vlm_models.py

+    qeff_model.generate(inputs, streamer, device_ids=[0], runtime_ai100=False)
+    # cloud_ai_100_tokens = exec_info[0]  # Because we always run for single input and single batch size
+    # gen_len = ort_tokens.shape[-1]
+    # assert (


Why are all asserts commented out? The objective of this method is to test the output correctness of native pytorch model output vs transformed pytorch output vs ORT output vs ai100 output. Please do add all the tests back.

Signed-off-by: Dipankar Sarkar <quic_dipankar@quicinc.com>

quic-akuruvil · 2025-01-29T05:57:47Z

QEfficient/transformers/models/test_vlm_single.py

+from QEfficient.utils import hf_download
+
+
+def load_vlm_model(model_config):


Change name to image_text_to_text

quic-dhirajku · 2025-01-29T11:11:20Z

QEfficient/transformers/models/modeling_auto.py

+        )
+        inputs["attention_mask"] = torch.nn.functional.pad(
+            inputs["attention_mask"], (0, 1024 - input_ids_size), "constant", 0
+        )


The values 1024 have to be replaced with constant.seq_len.

quic-amitraj · 2025-02-14T11:14:39Z

Already addressed in #267

Adding VLM pipeline

2cee194

Signed-off-by: Dipankar Sarkar <quic_dipankar@quicinc.com>

qcdipankar requested review from quic-rishinr and ochougul as code owners January 21, 2025 14:24

qcdipankar requested review from anujgupt-github, vbaddi and quic-akuruvil January 21, 2025 14:37

quic-akuruvil requested changes Jan 22, 2025

View reviewed changes

quic-amitraj marked this pull request as draft January 23, 2025 10:45

quic-amitraj self-requested a review January 23, 2025 17:39

quic-amitraj requested changes Jan 23, 2025

View reviewed changes

quic-rishinr reviewed Jan 24, 2025

View reviewed changes

Adding VLM Pipeline to Qeff

b97cd22

Signed-off-by: Dipankar Sarkar <quic_dipankar@quicinc.com>

quic-akuruvil reviewed Jan 29, 2025

View reviewed changes

quic-dhirajku reviewed Jan 29, 2025

View reviewed changes

quic-amitraj closed this Feb 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding VLM pipeline #234

Adding VLM pipeline #234

qcdipankar commented Jan 21, 2025 •

edited

Loading

quic-akuruvil Jan 21, 2025

quic-akuruvil Jan 22, 2025

qcdipankar Jan 29, 2025

quic-akuruvil Jan 21, 2025

qcdipankar Jan 29, 2025

quic-akuruvil Jan 21, 2025

qcdipankar Jan 29, 2025

quic-akuruvil Jan 22, 2025

qcdipankar Jan 29, 2025

quic-rishinr commented Jan 22, 2025

quic-amitraj left a comment

quic-amitraj Jan 23, 2025

quic-amitraj Jan 23, 2025

quic-amitraj Jan 23, 2025

quic-akuruvil commented Jan 24, 2025

quic-rishinr Jan 22, 2025

quic-rishinr Jan 22, 2025

quic-rishinr Jan 22, 2025

quic-rishinr Jan 22, 2025

quic-rishinr Jan 22, 2025

quic-rishinr Jan 23, 2025

quic-rishinr Jan 23, 2025

quic-rishinr Jan 23, 2025

quic-rishinr Jan 23, 2025

quic-rishinr Jan 23, 2025

quic-akuruvil Jan 29, 2025

quic-dhirajku Jan 29, 2025

quic-amitraj commented Feb 14, 2025

		from QEfficient.utils import hf_download


		def load_vlm_model(model_config):

Adding VLM pipeline #234

Adding VLM pipeline #234

Conversation

qcdipankar commented Jan 21, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

quic-rishinr commented Jan 22, 2025

quic-amitraj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

quic-akuruvil commented Jan 24, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

quic-amitraj commented Feb 14, 2025

qcdipankar commented Jan 21, 2025 •

edited

Loading