Skip to content

[Bug]: nightly version: EngineCore encountered a fatal error. #17276

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Zhiyuan-Fan opened this issue Apr 28, 2025 · 10 comments · Fixed by #17283
Closed

[Bug]: nightly version: EngineCore encountered a fatal error. #17276

Zhiyuan-Fan opened this issue Apr 28, 2025 · 10 comments · Fixed by #17283
Labels
bug Something isn't working

Comments

@Zhiyuan-Fan
Copy link

Your current environment

The output of `python collect_env.py`
Your output of `python collect_env.py` here

🐛 Describe the bug

vllm serve Qwen/Qwen2.5-VL-3B-Instruct

INFO 04-28 02:37:03 [async_llm.py:252] Added request 13_chatcmpl-c78d39a6d1d8469f90f3bda9bd41ca6a.                                                                                                                                           
INFO 04-28 02:37:03 [async_llm.py:252] Added request 14_chatcmpl-c78d39a6d1d8469f90f3bda9bd41ca6a.                                                                                                                                           
INFO 04-28 02:37:03 [async_llm.py:252] Added request 15_chatcmpl-c78d39a6d1d8469f90f3bda9bd41ca6a.                                                                                                                                           
ERROR 04-28 02:37:03 [core.py:398] EngineCore encountered a fatal error.                                                                                                                                                                     
ERROR 04-28 02:37:03 [core.py:398] Traceback (most recent call last):                                                                                                                                                                        
ERROR 04-28 02:37:03 [core.py:398]   File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 389, in run_engine_core                                                                              
ERROR 04-28 02:37:03 [core.py:398]     engine_core.run_busy_loop()                                                                                                                                                                           
ERROR 04-28 02:37:03 [core.py:398]   File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 413, in run_busy_loop                                                                                
ERROR 04-28 02:37:03 [core.py:398]     self._process_engine_step()                                                                                                                                                                           
ERROR 04-28 02:37:03 [core.py:398]   File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 438, in _process_engine_step                                                                         
ERROR 04-28 02:37:03 [core.py:398]     outputs = self.step_fn()                                                                                                                                                                              
ERROR 04-28 02:37:03 [core.py:398]               ^^^^^^^^^^^^^^                                                                                                                                                                              
ERROR 04-28 02:37:03 [core.py:398]   File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 203, in step                                                                                         
ERROR 04-28 02:37:03 [core.py:398]     output = self.model_executor.execute_model(scheduler_output)                                                                                                                                          
ERROR 04-28 02:37:03 [core.py:398]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                          
ERROR 04-28 02:37:03 [core.py:398]   File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 86, in execute_model                                                                           
ERROR 04-28 02:37:03 [core.py:398]     output = self.collective_rpc("execute_model",                                                                                                                                                         
ERROR 04-28 02:37:03 [core.py:398]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                         
ERROR 04-28 02:37:03 [core.py:398]   File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc                                                                     
ERROR 04-28 02:37:03 [core.py:398]     answer = run_method(self.driver_worker, method, args, kwargs)                                                                                                                                         
ERROR 04-28 02:37:03 [core.py:398]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                         
ERROR 04-28 02:37:03 [core.py:398]   File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/utils.py", line 2456, in run_method                                                                                           
ERROR 04-28 02:37:03 [core.py:398]     return func(*args, **kwargs)                                                                                                                                                                          
ERROR 04-28 02:37:03 [core.py:398]            ^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                          
ERROR 04-28 02:37:03 [core.py:398]   File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context                                                                         
ERROR 04-28 02:37:03 [core.py:398]     return func(*args, **kwargs)                                                                                                                                                                          
ERROR 04-28 02:37:03 [core.py:398]            ^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                          
ERROR 04-28 02:37:03 [core.py:398]   File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 268, in execute_model                                                                          
ERROR 04-28 02:37:03 [core.py:398]     output = self.model_runner.execute_model(scheduler_output)                                                                                                                                            
ERROR 04-28 02:37:03 [core.py:398]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^               
ERROR 04-28 02:37:03 [core.py:398]   File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 04-28 02:37:03 [core.py:398]     return func(*args, **kwargs)
ERROR 04-28 02:37:03 [core.py:398]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398]   File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1092, in execute_model
ERROR 04-28 02:37:03 [core.py:398]     output = self.model( 
ERROR 04-28 02:37:03 [core.py:398]              ^^^^^^^^^^^ 
ERROR 04-28 02:37:03 [core.py:398]   File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
ERROR 04-28 02:37:03 [core.py:398]     return self._call_impl(*args, **kwargs)
ERROR 04-28 02:37:03 [core.py:398]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398]   File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
ERROR 04-28 02:37:03 [core.py:398]     return forward_call(*args, **kwargs)
ERROR 04-28 02:37:03 [core.py:398]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398]   File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/model_executor/models/qwen2_5_vl.py", line 1106, in forward
ERROR 04-28 02:37:03 [core.py:398]     hidden_states = self.language_model.model(
ERROR 04-28 02:37:03 [core.py:398]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398]   File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/compilation/decorators.py", line 245, in __call__
ERROR 04-28 02:37:03 [core.py:398]     model_output = self.forward(*args, **kwargs)
ERROR 04-28 02:37:03 [core.py:398]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398]   File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/model_executor/models/qwen2.py", line 325, in forward
ERROR 04-28 02:37:03 [core.py:398]     def forward(
ERROR 04-28 02:37:03 [core.py:398]   File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
ERROR 04-28 02:37:03 [core.py:398]     return self._call_impl(*args, **kwargs)
ERROR 04-28 02:37:03 [core.py:398]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398]   File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
ERROR 04-28 02:37:03 [core.py:398]     return forward_call(*args, **kwargs)
ERROR 04-28 02:37:03 [core.py:398]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398]   File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 745, in _fn
ERROR 04-28 02:37:03 [core.py:398]     return fn(*args, **kwargs)
ERROR 04-28 02:37:03 [core.py:398]            ^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398]   File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/fx/graph_module.py", line 822, in call_wrapped
ERROR 04-28 02:37:03 [core.py:398]     return self._wrapped_call(self, *args, **kwargs)
ERROR 04-28 02:37:03 [core.py:398]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398]   File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/fx/graph_module.py", line 400, in __call__
ERROR 04-28 02:37:03 [core.py:398]     raise e
ERROR 04-28 02:37:03 [core.py:398]   File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/fx/graph_module.py", line 387, in __call__
ERROR 04-28 02:37:03 [core.py:398]     return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
ERROR 04-28 02:37:03 [core.py:398]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398]   File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
ERROR 04-28 02:37:03 [core.py:398]     return self._call_impl(*args, **kwargs)
ERROR 04-28 02:37:03 [core.py:398]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398]   File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
ERROR 04-28 02:37:03 [core.py:398]     return forward_call(*args, **kwargs)
ERROR 04-28 02:37:03 [core.py:398]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398]   File "<eval_with_key>.74", line 270, in forward
ERROR 04-28 02:37:03 [core.py:398]     submod_1 = self.submod_1(getitem, s0, getitem_1, getitem_2, getitem_3);  getitem = getitem_1 = getitem_2 = submod_1 = None
ERROR 04-28 02:37:03 [core.py:398]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398]   File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/fx/graph_module.py", line 822, in call_wrapped
ERROR 04-28 02:37:03 [core.py:398]     return self._wrapped_call(self, *args, **kwargs)
ERROR 04-28 02:37:03 [core.py:398]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                             02:37:03 [134/1860]
ERROR 04-28 02:37:03 [core.py:398]   File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/fx/graph_module.py", line 822, in call_wrapped
ERROR 04-28 02:37:03 [core.py:398]     return self._wrapped_call(self, *args, **kwargs)
ERROR 04-28 02:37:03 [core.py:398]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398]   File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/fx/graph_module.py", line 400, in __call__
ERROR 04-28 02:37:03 [core.py:398]     raise e
ERROR 04-28 02:37:03 [core.py:398]   File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/fx/graph_module.py", line 387, in __call__
ERROR 04-28 02:37:03 [core.py:398]     return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
ERROR 04-28 02:37:03 [core.py:398]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398]   File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
ERROR 04-28 02:37:03 [core.py:398]     return self._call_impl(*args, **kwargs)
ERROR 04-28 02:37:03 [core.py:398]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398]   File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
ERROR 04-28 02:37:03 [core.py:398]     return forward_call(*args, **kwargs)
ERROR 04-28 02:37:03 [core.py:398]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398]   File "<eval_with_key>.2", line 5, in forward
ERROR 04-28 02:37:03 [core.py:398]     unified_attention_with_output = torch.ops.vllm.unified_attention_with_output(query_2, key_2, value, output_3, 'language_model.model.layers.0.self_attn.attn');  query_2 = key_2 = value = output_3 = u
nified_attention_with_output = None
ERROR 04-28 02:37:03 [core.py:398]                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398]   File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/_ops.py", line 1123, in __call__
ERROR 04-28 02:37:03 [core.py:398]     return self._op(*args, **(kwargs or {}))
ERROR 04-28 02:37:03 [core.py:398]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398]   File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/attention/layer.py", line 416, in unified_attention_with_output
ERROR 04-28 02:37:03 [core.py:398]     self.impl.forward(self,
ERROR 04-28 02:37:03 [core.py:398]   File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/v1/attention/backends/flash_attn.py", line 598, in forward
ERROR 04-28 02:37:03 [core.py:398]     cascade_attention(
ERROR 04-28 02:37:03 [core.py:398]   File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/v1/attention/backends/flash_attn.py", line 730, in cascade_attention
ERROR 04-28 02:37:03 [core.py:398]     prefix_output, prefix_lse = flash_attn_varlen_func(
ERROR 04-28 02:37:03 [core.py:398]                                 ^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398]   File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/vllm_flash_attn/flash_attn_interface.py", line 252, in flash_attn_varlen_func
ERROR 04-28 02:37:03 [core.py:398]     out, softmax_lse, _, _ = torch.ops._vllm_fa3_C.fwd(
ERROR 04-28 02:37:03 [core.py:398]                              ^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398]   File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/_ops.py", line 1123, in __call__
ERROR 04-28 02:37:03 [core.py:398]     return self._op(*args, **(kwargs or {}))
ERROR 04-28 02:37:03 [core.py:398]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [core.py:398] RuntimeError: scheduler_metadata must have shape (metadata_size)
Process EngineCore_0:
ERROR 04-28 02:37:03 [async_llm.py:399] AsyncLLM output_handler failed.
ERROR 04-28 02:37:03 [async_llm.py:399] Traceback (most recent call last):
ERROR 04-28 02:37:03 [async_llm.py:399]   File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 357, in output_handler
ERROR 04-28 02:37:03 [async_llm.py:399]     outputs = await engine_core.get_output_async()
ERROR 04-28 02:37:03 [async_llm.py:399]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-28 02:37:03 [async_llm.py:399]   File "/home/zhiyuan/anaconda3/envs/vllm/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 716, in get_output_async
ERROR 04-28 02:37:03 [async_llm.py:399]     raise self._format_exception(outputs) from None
ERROR 04-28 02:37:03 [async_llm.py:399] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
INFO 04-28 02:37:03 [async_llm.py:324] Request chatcmpl-931e30a747354f9eb969d74e0917a5b8 failed (engine dead).
INFO 04-28 02:37:03 [async_llm.py:324] Request chatcmpl-bb885cde9cc64d099805f73700577ffc failed (engine dead).
INFO 04-28 02:37:03 [async_llm.py:324] Request chatcmpl-e3401e02484243b48e9e73750055b16a failed (engine dead).```

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.
@Zhiyuan-Fan Zhiyuan-Fan added the bug Something isn't working label Apr 28, 2025
@Zhiyuan-Fan
Copy link
Author

@DarkLight1337 Or, can you provide a stable deployment vllm version for this model ? Qwen/Qwen2.5-VL-3B-Instruct

@DarkLight1337
Copy link
Member

Are you using the latest nightly? I vaguely recall this issue being fixed recently

@Zhiyuan-Fan
Copy link
Author

Zhiyuan-Fan commented Apr 28, 2025

>>> vllm.__version__
'0.8.5.dev285+gd8bccde68'
>>> 

@Zhiyuan-Fan
Copy link
Author

I just installed this version two hours ago:
pip install -U vllm --pre --extra-index-url https://wheels.vllm.ai/nightly

@DarkLight1337
Copy link
Member

cc @LucasWilkinson

@jeremyzhang866
Copy link

jeremyzhang866 commented Apr 28, 2025

Are you using the latest nightly? I vaguely recall this issue being fixed recently

could you display the issue or pr? :)

@DarkLight1337
Copy link
Member

#16998

@LucasWilkinson
Copy link
Collaborator

LucasWilkinson commented Apr 28, 2025

@Zhiyuan-Fan Can you please try: #17283

And/or provide a script to reproduce the bug? Thanks for the bug report!

@Zhiyuan-Fan
Copy link
Author

Zhiyuan-Fan commented Apr 28, 2025

@LucasWilkinson
PR: #17283
has solved this issue, thanks!

Image

how to reproduce this issue:

nice -n -20 vllm serve Qwen/Qwen2.5-VL-7B-Instruct --port 12346 vllm version: '0.8.5.dev285+gd8bccde68'
then send concurrent requests to the endpoint.

@Zhiyuan-Fan Can you please try: #17283

And/or provide a script to reproduce the bug? Thanks for the bug report!

@LucasWilkinson
Copy link
Collaborator

Thanks for checking!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
4 participants