MLA layer eliminates redundant index operators #993

huiyingCCCC · 2025-05-28T09:45:54Z

What this PR does / why we need it?

During the autoregressive decoding process, the cos and sin values are exactly the same for each layer(such as 61 layers). Therefore, they only need to be calculated in the first layer, and subsequent layers can directly reuse them.

Does this PR introduce any user-facing change?

How was this patch tested?

MengqingCao

Please add pr description
Run bash format.sh locally to fix lint failures

MengqingCao · 2025-05-29T07:26:05Z

vllm_ascend/attention/attention.py

-
-            q_pe = self.rope_single(q_pe, cos, sin)
-            k_pe, k_nope = self.exec_kv(hidden_states_or_kv_c_normed, cos, sin,
+            if self.layer_idx == 0 or self.cos is None or self.sin is None:


Could you make more comments on why updating self.cos and self.sin only when layer_idx == 0?

During the autoregressive decoding process, the cos and sin values are exactly the same for each layer(such as 61 layers). Therefore, they only need to be calculated in the first layer, and subsequent layers can directly reuse them.

Thanks for the explaination, let's add this comments into code

MengqingCao · 2025-05-29T07:27:11Z

vllm_ascend/models/deepseek_v2.py

@@ -392,6 +392,7 @@ def __init__(
            kv_a_layernorm=self.kv_a_layernorm,
            kv_b_proj=self.kv_b_proj,
            o_proj=self.o_proj,
+            ascend_prefix=prefix,


I recommand to pass by debug_layer_idx instead of ascend_prefix

Signed-off-by: huiying <chenhuiying4@huawei.com>

MengqingCao reviewed May 29, 2025

View reviewed changes

huiyingCCCC force-pushed the main branch 3 times, most recently from 742a3ee to 8cd3c81 Compare May 29, 2025 09:28

huiyingCCCC changed the title ~~MLA层消除冗余index小算子~~ MLA layer eliminates redundant index operators May 30, 2025

MLA layer eliminates redundant index operators

cb8064b

Signed-off-by: huiying <chenhuiying4@huawei.com>

huiyingCCCC force-pushed the main branch from 8cd3c81 to cb8064b Compare May 30, 2025 07:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MLA layer eliminates redundant index operators #993

MLA layer eliminates redundant index operators #993

huiyingCCCC commented May 28, 2025 •

edited

Loading

Uh oh!

MengqingCao left a comment

Uh oh!

MengqingCao May 29, 2025

Uh oh!

huiyingCCCC May 29, 2025

Uh oh!

MengqingCao May 29, 2025

Uh oh!

huiyingCCCC May 30, 2025

Uh oh!

MengqingCao May 29, 2025

Uh oh!

huiyingCCCC May 30, 2025

Uh oh!

Uh oh!

MLA layer eliminates redundant index operators #993

Are you sure you want to change the base?

MLA layer eliminates redundant index operators #993

Conversation

huiyingCCCC commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

MengqingCao left a comment

Choose a reason for hiding this comment

Uh oh!

MengqingCao May 29, 2025

Choose a reason for hiding this comment

Uh oh!

huiyingCCCC May 29, 2025

Choose a reason for hiding this comment

Uh oh!

MengqingCao May 29, 2025

Choose a reason for hiding this comment

Uh oh!

huiyingCCCC May 30, 2025

Choose a reason for hiding this comment

Uh oh!

MengqingCao May 29, 2025

Choose a reason for hiding this comment

Uh oh!

huiyingCCCC May 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

huiyingCCCC commented May 28, 2025 •

edited

Loading