Skip to content

MLA layer eliminates redundant index operators #993

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

huiyingCCCC
Copy link

@huiyingCCCC huiyingCCCC commented May 28, 2025

What this PR does / why we need it?

During the autoregressive decoding process, the cos and sin values are exactly the same for each layer(such as 61 layers). Therefore, they only need to be calculated in the first layer, and subsequent layers can directly reuse them.

Does this PR introduce any user-facing change?

How was this patch tested?

Copy link
Collaborator

@MengqingCao MengqingCao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Please add pr description
  2. Run bash format.sh locally to fix lint failures


q_pe = self.rope_single(q_pe, cos, sin)
k_pe, k_nope = self.exec_kv(hidden_states_or_kv_c_normed, cos, sin,
if self.layer_idx == 0 or self.cos is None or self.sin is None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make more comments on why updating self.cos and self.sin only when layer_idx == 0?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During the autoregressive decoding process, the cos and sin values are exactly the same for each layer(such as 61 layers). Therefore, they only need to be calculated in the first layer, and subsequent layers can directly reuse them.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explaination, let's add this comments into code

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -392,6 +392,7 @@ def __init__(
kv_a_layernorm=self.kv_a_layernorm,
kv_b_proj=self.kv_b_proj,
o_proj=self.o_proj,
ascend_prefix=prefix,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommand to pass by debug_layer_idx instead of ascend_prefix

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@huiyingCCCC huiyingCCCC force-pushed the main branch 3 times, most recently from 742a3ee to 8cd3c81 Compare May 29, 2025 09:28
@huiyingCCCC huiyingCCCC changed the title MLA层消除冗余index小算子 MLA layer eliminates redundant index operators May 30, 2025
Signed-off-by: huiying <chenhuiying4@huawei.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants