Skip to content

add optimze of dsv3 #970

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

add optimze of dsv3 #970

wants to merge 1 commit into from

Conversation

momo609
Copy link

@momo609 momo609 commented May 27, 2025

What this PR does / why we need it?

Optimize the performance of calculation logic in sampler and deepseekv2.

Does this PR introduce any user-facing change?

Added VLLM_ENABLE_TOPK_OPTIMZE config in sampler

How was this patch tested?

pytest test_sampler.py

Copy link
Collaborator

@MengqingCao MengqingCao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Please add pr description
  2. Run bash format.sh locally to fix lint failures

@momo609 momo609 force-pushed the main branch 3 times, most recently from 52aff53 to c35b678 Compare May 30, 2025 07:06
@momo609
Copy link
Author

momo609 commented May 30, 2025

@wangxiyuan

@momo609 momo609 force-pushed the main branch 3 times, most recently from efe0a26 to 90ae7ec Compare May 30, 2025 08:44
Signed-off-by: wangxiaoxin (A) <w00664509@china.huawei.com>
@@ -225,8 +225,7 @@ def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
enable_force_load_balance = False
num_tokens, hidden_dim = hidden_states.shape

if self.n_shared_experts is not None:
shared_output = self.shared_experts(hidden_states)
old_hidden_states = hidden_states.detach()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you upload the performance profiling on this part? Distach this variable from the pytorch graph seems can't actually triggers the parallel execution according to my knowledge.

_info "====> Start simple_test"
simple_test
_info "====> Start quickstart_offline_test"
quickstart_offline_test
_info "====> Start quickstart_online_test"
quickstart_online_test
_info "====> Start quickstart_offline_test_topk"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add offline test in py by setting os.eviron rather than here:

https://github.com/vllm-project/vllm-ascend/blob/main/tests%2Fsinglecard%2Ftest_offline_inference.py#L44-L44

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants