-
Notifications
You must be signed in to change notification settings - Fork 176
add optimze of dsv3 #970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
add optimze of dsv3 #970
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Please add pr description
- Run
bash format.sh
locally to fix lint failures
52aff53
to
c35b678
Compare
efe0a26
to
90ae7ec
Compare
Signed-off-by: wangxiaoxin (A) <w00664509@china.huawei.com>
@@ -225,8 +225,7 @@ def forward(self, hidden_states: torch.Tensor) -> torch.Tensor: | |||
enable_force_load_balance = False | |||
num_tokens, hidden_dim = hidden_states.shape | |||
|
|||
if self.n_shared_experts is not None: | |||
shared_output = self.shared_experts(hidden_states) | |||
old_hidden_states = hidden_states.detach() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you upload the performance profiling on this part? Distach this variable from the pytorch graph seems can't actually triggers the parallel execution according to my knowledge.
_info "====> Start simple_test" | ||
simple_test | ||
_info "====> Start quickstart_offline_test" | ||
quickstart_offline_test | ||
_info "====> Start quickstart_online_test" | ||
quickstart_online_test | ||
_info "====> Start quickstart_offline_test_topk" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add offline test in py by setting os.eviron rather than here:
What this PR does / why we need it?
Optimize the performance of calculation logic in sampler and deepseekv2.
Does this PR introduce any user-facing change?
Added VLLM_ENABLE_TOPK_OPTIMZE config in sampler
How was this patch tested?
pytest test_sampler.py