-
Notifications
You must be signed in to change notification settings - Fork 176
MLA layer eliminates redundant index operators #993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Please add pr description
- Run bash format.sh locally to fix lint failures
vllm_ascend/attention/attention.py
Outdated
|
||
q_pe = self.rope_single(q_pe, cos, sin) | ||
k_pe, k_nope = self.exec_kv(hidden_states_or_kv_c_normed, cos, sin, | ||
if self.layer_idx == 0 or self.cos is None or self.sin is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you make more comments on why updating self.cos
and self.sin
only when layer_idx == 0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During the autoregressive decoding process, the cos and sin values are exactly the same for each layer(such as 61 layers). Therefore, they only need to be calculated in the first layer, and subsequent layers can directly reuse them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explaination, let's add this comments into code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
vllm_ascend/models/deepseek_v2.py
Outdated
@@ -392,6 +392,7 @@ def __init__( | |||
kv_a_layernorm=self.kv_a_layernorm, | |||
kv_b_proj=self.kv_b_proj, | |||
o_proj=self.o_proj, | |||
ascend_prefix=prefix, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommand to pass by debug_layer_idx
instead of ascend_prefix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
742a3ee
to
8cd3c81
Compare
Signed-off-by: huiying <chenhuiying4@huawei.com>
What this PR does / why we need it?
During the autoregressive decoding process, the cos and sin values are exactly the same for each layer(such as 61 layers). Therefore, they only need to be calculated in the first layer, and subsequent layers can directly reuse them.
Does this PR introduce any user-facing change?
How was this patch tested?