Skip to content

Commit 80c24af

Browse files
chaunceyjiangMu Huai
authored and
Mu Huai
committed
[Feature][Frontend]: Deprecate --enable-reasoning (vllm-project#17452)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
1 parent dce1ef0 commit 80c24af

16 files changed

+49
-91
lines changed

docs/source/features/reasoning_outputs.md

Lines changed: 5 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -21,11 +21,10 @@ vLLM currently supports the following reasoning models:
2121

2222
## Quickstart
2323

24-
To use reasoning models, you need to specify the `--enable-reasoning` and `--reasoning-parser` flags when making a request to the chat completion endpoint. The `--reasoning-parser` flag specifies the reasoning parser to use for extracting reasoning content from the model output.
24+
To use reasoning models, you need to specify the `--reasoning-parser` flags when making a request to the chat completion endpoint. The `--reasoning-parser` flag specifies the reasoning parser to use for extracting reasoning content from the model output.
2525

2626
```bash
27-
vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B \
28-
--enable-reasoning --reasoning-parser deepseek_r1
27+
vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --reasoning-parser deepseek_r1
2928
```
3029

3130
Next, make a request to the model that should return the reasoning content in the response.
@@ -140,8 +139,7 @@ Remember to check whether the `reasoning_content` exists in the response before
140139
The reasoning content is also available in the structured output. The structured output engine like `xgrammar` will use the reasoning content to generate structured output. It is only supported in v0 engine now.
141140

142141
```bash
143-
VLLM_USE_V1=0 vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B \
144-
--enable-reasoning --reasoning-parser deepseek_r1
142+
VLLM_USE_V1=0 vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --reasoning-parser deepseek_r1
145143
```
146144

147145
Please note that the `VLLM_USE_V1` environment variable must be set to `0` to use the v0 engine.
@@ -316,9 +314,8 @@ class DeepSeekReasoner(Reasoner):
316314

317315
The structured output engine like `xgrammar` will use `end_token_id` to check if the reasoning content is present in the model output and skip the structured output if it is the case.
318316

319-
Finally, you can enable reasoning for the model by using the `--enable-reasoning` and `--reasoning-parser` flags.
317+
Finally, you can enable reasoning for the model by using the `--reasoning-parser` flags.
320318

321319
```bash
322-
vllm serve <model_tag> \
323-
--enable-reasoning --reasoning-parser example
320+
vllm serve <model_tag> --reasoning-parser example
324321
```

examples/online_serving/openai_chat_completion_structured_outputs_with_reasoning.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
1010
```bash
1111
vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B \
12-
--enable-reasoning --reasoning-parser deepseek_r1
12+
--reasoning-parser deepseek_r1
1313
```
1414
1515
This example demonstrates how to generate chat completions from reasoning models

examples/online_serving/openai_chat_completion_tool_calls_with_reasoning.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
1010
```bash
1111
vllm serve Qwen/QwQ-32B \
12-
--enable-reasoning --reasoning-parser deepseek_r1 \
12+
--reasoning-parser deepseek_r1 \
1313
--enable-auto-tool-choice --tool-call-parser hermes
1414
1515
```

examples/online_serving/openai_chat_completion_with_reasoning.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
99
```bash
1010
vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B \
11-
--enable-reasoning --reasoning-parser deepseek_r1
11+
--reasoning-parser deepseek_r1
1212
```
1313
1414
This example demonstrates how to generate chat completions from reasoning models

examples/online_serving/openai_chat_completion_with_reasoning_streaming.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
99
```bash
1010
vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B \
11-
--enable-reasoning --reasoning-parser deepseek_r1
11+
--reasoning-parser deepseek_r1
1212
```
1313
1414
Unlike openai_chat_completion_with_reasoning.py, this example demonstrates the

tests/entrypoints/openai/test_chat_with_tool_reasoning.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,9 +13,9 @@
1313
@pytest.fixture(scope="module")
1414
def server(): # noqa: F811
1515
args = [
16-
"--max-model-len", "8192", "--enforce-eager", "--enable-reasoning",
17-
"--reasoning-parser", "deepseek_r1", "--enable-auto-tool-choice",
18-
"--tool-call-parser", "hermes"
16+
"--max-model-len", "8192", "--enforce-eager", "--reasoning-parser",
17+
"deepseek_r1", "--enable-auto-tool-choice", "--tool-call-parser",
18+
"hermes"
1919
]
2020

2121
with RemoteOpenAIServer(MODEL_NAME, args) as remote_server:

tests/entrypoints/openai/test_cli_args.py

Lines changed: 3 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -122,31 +122,23 @@ def test_enable_auto_choice_fails_with_enable_reasoning(serve_parser):
122122
"""Ensure validation fails if reasoning is enabled with auto tool choice"""
123123
args = serve_parser.parse_args(args=[
124124
"--enable-auto-tool-choice",
125-
"--enable-reasoning",
125+
"--reasoning-parser",
126+
"deepseek_r1",
126127
])
127128
with pytest.raises(TypeError):
128129
validate_parsed_serve_args(args)
129130

130131

131-
def test_enable_reasoning_passes_with_reasoning_parser(serve_parser):
132+
def test_passes_with_reasoning_parser(serve_parser):
132133
"""Ensure validation passes if reasoning is enabled
133134
with a reasoning parser"""
134135
args = serve_parser.parse_args(args=[
135-
"--enable-reasoning",
136136
"--reasoning-parser",
137137
"deepseek_r1",
138138
])
139139
validate_parsed_serve_args(args)
140140

141141

142-
def test_enable_reasoning_fails_without_reasoning_parser(serve_parser):
143-
"""Ensure validation fails if reasoning is enabled
144-
without a reasoning parser"""
145-
args = serve_parser.parse_args(args=["--enable-reasoning"])
146-
with pytest.raises(TypeError):
147-
validate_parsed_serve_args(args)
148-
149-
150142
def test_chat_template_validation_for_happy_paths(serve_parser):
151143
"""Ensure validation passes if the chat template exists"""
152144
args = serve_parser.parse_args(

vllm/config.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3225,10 +3225,9 @@ def guided_decoding_backend(self, value: GuidedDecodingBackend):
32253225
in the JSON schema. This is only supported for the `guidance` backend and
32263226
is used to better align its behaviour with `outlines` and `xgrammar`."""
32273227

3228-
reasoning_backend: Optional[str] = None
3228+
reasoning_backend: str = ""
32293229
"""Select the reasoning parser depending on the model that you're using.
3230-
This is used to parse the reasoning content into OpenAI API format.
3231-
Required for `--enable-reasoning`."""
3230+
This is used to parse the reasoning content into OpenAI API format."""
32323231

32333232
def compute_hash(self) -> str:
32343233
"""

vllm/engine/arg_utils.py

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -365,8 +365,9 @@ class EngineArgs:
365365
calculate_kv_scales: bool = CacheConfig.calculate_kv_scales
366366

367367
additional_config: Optional[Dict[str, Any]] = None
368-
enable_reasoning: Optional[bool] = None
369-
reasoning_parser: Optional[str] = DecodingConfig.reasoning_backend
368+
enable_reasoning: Optional[bool] = None # DEPRECATED
369+
reasoning_parser: str = DecodingConfig.reasoning_backend
370+
370371
use_tqdm_on_load: bool = LoadConfig.use_tqdm_on_load
371372

372373
def __post_init__(self):
@@ -798,8 +799,15 @@ def add_cli_args(parser: FlexibleArgumentParser) -> FlexibleArgumentParser:
798799
"--enable-reasoning",
799800
action="store_true",
800801
default=False,
801-
help="Whether to enable reasoning_content for the model. "
802-
"If enabled, the model will be able to generate reasoning content."
802+
help=
803+
"[DEPRECATED] " \
804+
"The --enable-reasoning flag is deprecated as of v0.8.6. "
805+
"Use --reasoning-parser to specify " \
806+
"the reasoning parser backend instead. "
807+
"This flag (--enable-reasoning) will be " \
808+
"removed in v0.10.0. "
809+
"When --reasoning-parser is specified, " \
810+
"reasoning mode is automatically enabled."
803811
)
804812

805813
return parser
@@ -1088,7 +1096,6 @@ def create_engine_config(
10881096
disable_additional_properties=\
10891097
self.guided_decoding_disable_additional_properties,
10901098
reasoning_backend=self.reasoning_parser
1091-
if self.enable_reasoning else None,
10921099
)
10931100

10941101
observability_config = ObservabilityConfig(

vllm/engine/llm_engine.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2096,7 +2096,7 @@ def _build_logits_processors(
20962096
guided_decoding.backend = guided_decoding.backend or \
20972097
self.decoding_config.backend
20982098

2099-
if self.decoding_config.reasoning_backend is not None:
2099+
if self.decoding_config.reasoning_backend:
21002100
logger.debug("Building with reasoning backend %s",
21012101
self.decoding_config.reasoning_backend)
21022102

vllm/entrypoints/openai/api_server.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -967,7 +967,6 @@ async def init_app_state(
967967
return_tokens_as_token_ids=args.return_tokens_as_token_ids,
968968
enable_auto_tools=args.enable_auto_tool_choice,
969969
tool_parser=args.tool_call_parser,
970-
enable_reasoning=args.enable_reasoning,
971970
reasoning_parser=args.reasoning_parser,
972971
enable_prompt_tokens_details=args.enable_prompt_tokens_details,
973972
) if model_config.runner_type == "generate" else None
@@ -1053,7 +1052,7 @@ async def run_server(args, **uvicorn_kwargs) -> None:
10531052
f"(chose from {{ {','.join(valid_tool_parses)} }})")
10541053

10551054
valid_reasoning_parses = ReasoningParserManager.reasoning_parsers.keys()
1056-
if args.enable_reasoning \
1055+
if args.reasoning_parser \
10571056
and args.reasoning_parser not in valid_reasoning_parses:
10581057
raise KeyError(
10591058
f"invalid reasoning parser: {args.reasoning_parser} "

vllm/entrypoints/openai/cli_args.py

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -284,11 +284,6 @@ def validate_parsed_serve_args(args: argparse.Namespace):
284284
raise TypeError("Error: --enable-auto-tool-choice requires "
285285
"--tool-call-parser")
286286

287-
# Enable reasoning needs a reasoning parser to be valid
288-
if args.enable_reasoning and not args.reasoning_parser:
289-
raise TypeError("Error: --enable-reasoning requires "
290-
"--reasoning-parser")
291-
292287

293288
def create_parser_for_docs() -> FlexibleArgumentParser:
294289
parser_for_docs = FlexibleArgumentParser(

vllm/entrypoints/openai/serving_chat.py

Lines changed: 14 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -58,8 +58,7 @@ def __init__(
5858
chat_template: Optional[str],
5959
chat_template_content_format: ChatTemplateContentFormatOption,
6060
return_tokens_as_token_ids: bool = False,
61-
enable_reasoning: bool = False,
62-
reasoning_parser: Optional[str] = None,
61+
reasoning_parser: str = "",
6362
enable_auto_tools: bool = False,
6463
tool_parser: Optional[str] = None,
6564
enable_prompt_tokens_details: bool = False,
@@ -82,18 +81,17 @@ def __init__(
8281
" the parallel_tool_calls client option is preset for "
8382
"compatibility reasons, it will be ignored.")
8483

85-
self.enable_reasoning: bool = enable_reasoning
8684
self.reasoning_parser: Optional[Callable[[AnyTokenizer],
8785
ReasoningParser]] = None
88-
if self.enable_reasoning:
86+
if reasoning_parser:
8987
try:
9088
self.reasoning_parser = (
9189
ReasoningParserManager.get_reasoning_parser(
9290
reasoning_parser))
91+
assert self.reasoning_parser is not None
9392
except Exception as e:
94-
raise TypeError("Error: --enable-reasoning requires "
95-
f"reasoning_parser:'{reasoning_parser}' "
96-
"which has not been registered") from e
93+
raise TypeError(
94+
f"{reasoning_parser=} has not been registered") from e
9795
self.tool_parser: Optional[Callable[[AnyTokenizer], ToolParser]] = None
9896
if self.enable_auto_tools:
9997
try:
@@ -423,15 +421,12 @@ async def chat_completion_stream_generator(
423421
not tool_choice_function_name
424422
and self._should_stream_with_auto_tool_parsing(request))
425423

426-
should_stream_with_reasoning_parsing = (
427-
self._should_stream_with_reasoning_parsing(request))
428-
429424
all_previous_token_ids: Optional[list[list[int]]]
430425
function_name_returned: Optional[list[bool]] = None
431426

432427
# Only one of these will be used, thus previous_texts and
433428
# all_previous_token_ids will not be used twice in the same iteration.
434-
if tool_choice_auto or should_stream_with_reasoning_parsing:
429+
if tool_choice_auto or self.reasoning_parser:
435430
# These are only required in "auto" tool choice case
436431
previous_texts = [""] * num_choices
437432
all_previous_token_ids = [[]] * num_choices
@@ -446,20 +441,14 @@ async def chat_completion_stream_generator(
446441
previous_texts, all_previous_token_ids = None, None
447442

448443
try:
449-
# There is no need to check if the reasoning_parser is None
450-
# because the should_stream_with_reasoning_parsing check
451-
# already ensures that the reasoning_parser is not None.
452-
# but the pre-commit hook requires it.
453-
if should_stream_with_reasoning_parsing and \
454-
self.reasoning_parser is not None:
444+
if self.reasoning_parser:
455445
reasoning_parser = self.reasoning_parser(tokenizer)
456446
except RuntimeError as e:
457447
logger.exception("Error in reasoning parser creation.")
458448
data = self.create_streaming_error_response(str(e))
459449
yield f"data: {data}\n\n"
460450
yield "data: [DONE]\n\n"
461451
return
462-
463452
# Prepare the tool parser if it's needed
464453
try:
465454
if tool_choice_auto and self.tool_parser:
@@ -592,7 +581,7 @@ async def chat_completion_stream_generator(
592581
delta_message: Optional[DeltaMessage]
593582

594583
# just update previous_texts and previous_token_ids
595-
if tool_choice_auto or should_stream_with_reasoning_parsing:
584+
if tool_choice_auto or self.reasoning_parser:
596585
assert previous_texts is not None
597586
assert all_previous_token_ids is not None
598587
previous_text = previous_texts[i]
@@ -603,7 +592,7 @@ async def chat_completion_stream_generator(
603592

604593
# handle streaming deltas for tools with named tool_choice
605594
if tool_choice_function_name:
606-
if (self.enable_reasoning
595+
if (self.reasoning_parser
607596
and not reasoning_parser.is_reasoning_end(
608597
previous_token_ids)):
609598
assert reasoning_parser is not None
@@ -630,7 +619,7 @@ async def chat_completion_stream_generator(
630619
current_text = ""
631620
else:
632621
# Just to add remaining `content`
633-
if self.enable_reasoning:
622+
if self.reasoning_parser:
634623
delta_text = previous_text + delta_text
635624
current_text = ""
636625

@@ -660,7 +649,7 @@ async def chat_completion_stream_generator(
660649

661650
# handle streaming deltas for tools with "auto" tool choice
662651
# and reasoning parser
663-
elif tool_choice_auto and self.enable_reasoning:
652+
elif tool_choice_auto and self.reasoning_parser:
664653
assert tool_parser is not None
665654
assert reasoning_parser is not None
666655
assert added_content_delta_arr is not None
@@ -728,8 +717,7 @@ async def chat_completion_stream_generator(
728717
delta_token_ids=output.token_ids,
729718
request=request))
730719
# when only reasoning
731-
elif self.enable_reasoning:
732-
assert reasoning_parser is not None
720+
elif self.reasoning_parser:
733721
delta_message = (reasoning_parser.
734722
extract_reasoning_content_streaming(
735723
previous_text,
@@ -744,7 +732,7 @@ async def chat_completion_stream_generator(
744732
delta_message = DeltaMessage(content=delta_text)
745733

746734
# update the previous values for the next iteration
747-
if tool_choice_auto or should_stream_with_reasoning_parsing:
735+
if tool_choice_auto or self.reasoning_parser:
748736
assert previous_texts is not None
749737
assert all_previous_token_ids is not None
750738
previous_texts[i] = current_text
@@ -931,17 +919,9 @@ async def chat_completion_full_generator(
931919
)
932920
else:
933921
logprobs = None
934-
935-
should_stream_with_reasoning_parsing = (
936-
self._should_stream_with_reasoning_parsing(request))
937-
938-
# In the OpenAI API the finish_reason is "tools_called"
939-
# if the tool choice is auto and the model produced a tool
940-
# call. The same is not true for named function calls
941922
auto_tools_called = False
942923

943-
if should_stream_with_reasoning_parsing and \
944-
self.reasoning_parser is not None:
924+
if self.reasoning_parser:
945925
try:
946926
reasoning_parser = self.reasoning_parser(tokenizer)
947927
except RuntimeError as e:
@@ -1176,17 +1156,6 @@ def _should_stream_with_auto_tool_parsing(self,
11761156
return (request.tools and self.tool_parser and self.enable_auto_tools
11771157
and request.tool_choice in ['auto', None])
11781158

1179-
def _should_stream_with_reasoning_parsing(self,
1180-
request: ChatCompletionRequest):
1181-
"""
1182-
Utility function to check if streamed tokens should go through the
1183-
reasoning parser that was configured.
1184-
1185-
We only want to do this IF reasoning is enabled and a reasoning
1186-
parser is configured.
1187-
"""
1188-
return self.enable_reasoning and self.reasoning_parser is not None
1189-
11901159
def _should_check_for_unstreamed_tool_arg_tokens(
11911160
self,
11921161
delta_message: Optional[DeltaMessage],

vllm/model_executor/guided_decoding/__init__.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,7 @@ async def get_guided_decoding_logits_processor(
103103
reasoning_backend: str | None = None) -> LogitsProcessor | None:
104104

105105
reasoner = None
106-
if reasoning_backend is not None:
106+
if reasoning_backend:
107107
reasoner_class = ReasoningParserManager.get_reasoning_parser(
108108
reasoning_backend)
109109
reasoner = reasoner_class(tokenizer)
@@ -146,7 +146,7 @@ def get_local_guided_decoding_logits_processor(
146146
guided_params = maybe_backend_fallback(guided_params)
147147

148148
reasoner = None
149-
if reasoning_backend is not None:
149+
if reasoning_backend:
150150
reasoner_class = ReasoningParserManager.get_reasoning_parser(
151151
reasoning_backend)
152152
reasoner = reasoner_class(tokenizer)

vllm/model_executor/guided_decoding/outlines_logits_processors.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ def __call__(self, input_ids: List[int],
6161
"""Use the FSM to bias the logits before sampling the next token."""
6262

6363
# Skip the structured logits processing if reasoning is not finished.
64-
# reasoner is not None only when `--enable-reasoning` is set.
64+
# reasoner is not None only when `--reasoning-parser` is set.
6565
if self._reasoner is not None:
6666
if not self._reasoner.is_reasoning_end(input_ids):
6767
return scores

0 commit comments

Comments
 (0)