VLM Pipeline (Intern,LLava) #256

qcdipankar · 2025-02-03T10:10:22Z

Added Generic Framework to onboard and run VLMs in QEff

Signed-off-by: Dipankar Sarkar <quic_dipankar@quicinc.com>

quic-amitraj · 2025-02-03T15:05:47Z

QEfficient/transformers/models/modeling_auto.py

+        return generate_func(**kwargs)
+
+    def generate_inputs_intern(self, **kwargs):
+        bs: int = constants.ONNX_EXPORT_EXAMPLE_BATCH_SIZE


Please move this inside the modeling file.

quic-amitraj · 2025-02-03T15:06:37Z

QEfficient/transformers/models/modeling_auto.py

+        return inputs, dynamic_axes, output_names
+
+    def generate_inputs_llava(self, **kwargs):
+        bs: int = constants.ONNX_EXPORT_EXAMPLE_BATCH_SIZE


Move this inside the modeling file modelling_llava.py

quic-amitraj · 2025-02-03T15:07:36Z

QEfficient/transformers/models/modeling_auto.py

+        #         )
+        #     num_logits_to_keep = num_speculative_tokens + 1
+        #     if prefill_seq_len < num_logits_to_keep:
+        #         raise ValueError(


Remove commented lines.

quic-amitraj · 2025-02-03T15:09:16Z

QEfficient/transformers/models/modeling_auto.py

+        generation_len = self.ctx_len - input_len.max()  # in standalone this is tensor
+        assert generation_len > 0, "generation length should be greater than zero"
+        generated_ids = np.full((batch_size, generation_len + 1), self.processor.tokenizer.pad_token_id)
+        # inputs["input_ids"]=torch.nn.functional.pad(inputs["input_ids"],(0,self.seq_len_constant-inputs["input_ids"].size(1)),"constant",self.pad_token_id)


quic-amitraj · 2025-02-03T15:10:38Z

QEfficient/utils/constants.py

    PROMPT_LEN = 8
    INPUT_STR = ["My name is"]
    GB = 2**30
    MAX_QPC_LIMIT = 30
    MAX_RETRIES = 5  # This constant will be used set the maximum number of retry attempts for downloading a model using huggingface_hub snapshot_download
    NUM_SPECULATIVE_TOKENS = 2
+    CTX_LEN_VLM_LLAVA = 1280
+    IMG_SIZE = 336


Are you using these at the time of export to define shapes?

quic-amitraj · 2025-02-03T15:11:43Z

tests/transformers/models/test_image_text_to_text.py

+
+
+# if __name__ == "__main__":
+#     # model_name = "OpenGVLab/InternVL2_5-1B"


Remove commented parts.

quic-amitraj · 2025-02-03T15:12:47Z

QEfficient/base/modeling_qeff.py

@@ -251,6 +252,7 @@ def _compile(
        if num_speculative_tokens:
            compile_hash.update(to_hashable({"num_speculative_tokens": num_speculative_tokens}))

+        # import ipdb; ipdb.set_trace()


Remove these lines.

quic-amitraj · 2025-02-03T15:13:46Z

QEfficient/base/pytorch_transforms.py

+                if hasattr(module, "__qeff_init__"):
+                    module.__qeff_init__()
+                transformed = True
+


Can we combine both if conditions?

quic-akuruvil · 2025-02-04T05:36:25Z

QEfficient/transformers/models/modeling_auto.py

+        input_ids_size = input_ids.shape[1]
+        # attention_mask = inputs["attention_mask"]
+        inputs["input_ids"] = torch.nn.functional.pad(
+            inputs["input_ids"], (0, 3072 - input_ids_size), "constant", self.processor.tokenizer.pad_token_id


Please avoid hardcoded value.

Make this value generic, and fetch from qpc session - prefill_seq_len. For whichever value it was compiled for.

quic-akuruvil · 2025-02-04T05:50:25Z

QEfficient/transformers/models/modeling_auto.py

+            breakpoint()
+            self.model.config.use_cache = True
+            self.processor = processor
+            self.num_layers = model.config.text_config.num_hidden_layers


Make fetching num_layers generic. Also the padding shape. Please refer llava PR, use a similar function, which fetches based on model architecture.

quic-amitraj · 2025-02-14T11:13:17Z

Already addressed in #267.

qcdipankar added 2 commits February 3, 2025 09:26

VLM Pipeline for onboarding of VLMs

6247f6a

Signed-off-by: Dipankar Sarkar <quic_dipankar@quicinc.com>

VLM Pipeline for vlm model onboarding in qeff

20bb94b

qcdipankar requested review from quic-rishinr and ochougul as code owners February 3, 2025 10:10

qcdipankar requested review from vbaddi, quic-dhirajku, quic-akuruvil and mohiso22 February 3, 2025 10:11

quic-amitraj requested changes Feb 3, 2025

View reviewed changes

quic-akuruvil reviewed Feb 4, 2025

View reviewed changes

quic-amitraj closed this Feb 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VLM Pipeline (Intern,LLava) #256

VLM Pipeline (Intern,LLava) #256

qcdipankar commented Feb 3, 2025

quic-amitraj Feb 3, 2025

quic-amitraj Feb 3, 2025

quic-amitraj Feb 3, 2025

quic-amitraj Feb 3, 2025

quic-amitraj Feb 3, 2025

quic-amitraj Feb 3, 2025

quic-amitraj Feb 3, 2025

quic-amitraj Feb 3, 2025

quic-akuruvil Feb 4, 2025

quic-akuruvil Feb 4, 2025

quic-akuruvil Feb 4, 2025

quic-amitraj commented Feb 14, 2025



		# if __name__ == "__main__":
		# # model_name = "OpenGVLab/InternVL2_5-1B"

VLM Pipeline (Intern,LLava) #256

VLM Pipeline (Intern,LLava) #256

Conversation

qcdipankar commented Feb 3, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

quic-amitraj commented Feb 14, 2025