-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Add Hugging Face Chat Completion support to Inference Plugin #127254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
jonathan-buttner
merged 45 commits into
elastic:main
from
Jan-Kazlouski-elastic:feature/hugging-face-chat-completion-integration
May 19, 2025
Merged
Changes from all commits
Commits
Show all changes
45 commits
Select commit
Hold shift + click to select a range
63f21de
Add Hugging Face Chat Completion support to Inference Plugin
Jan-Kazlouski-elastic 6b7dd2e
Merge remote-tracking branch 'refs/remotes/origin/main' into feature/…
Jan-Kazlouski-elastic 65e4060
Add support for streaming chat completion task for HuggingFace
Jan-Kazlouski-elastic 404f640
[CI] Auto commit changes from spotless
elasticsearchmachine ceebb9a
Add support for non-streaming completion task for HuggingFace
Jan-Kazlouski-elastic acaa35b
Remove RequestManager for HF Chat Completion Task
Jan-Kazlouski-elastic 91fa92e
Merge remote-tracking branch 'refs/remotes/origin/main' into feature/…
Jan-Kazlouski-elastic ff3ef50
Refactored Hugging Face Completion Service Settings, removed Request …
Jan-Kazlouski-elastic 965093b
Refactored Hugging Face Action Creator, added Unit Tests
Jan-Kazlouski-elastic 6757b07
Add Hugging Face Server Test
Jan-Kazlouski-elastic 58ea9fd
Merge remote-tracking branch 'origin/main' into feature/hugging-face-…
Jan-Kazlouski-elastic df845eb
[CI] Auto commit changes from spotless
elasticsearchmachine cc24e68
Merge remote-tracking branch 'origin/main' into feature/hugging-face-…
Jan-Kazlouski-elastic 5bbe3b7
Removed parameters from media type for Chat Completion Request and un…
Jan-Kazlouski-elastic 3684816
Removed OpenAI default URL in HuggingFaceService's configuration, fix…
Jan-Kazlouski-elastic 7670d2c
Refactor error message handling in HuggingFaceActionCreator and Huggi…
Jan-Kazlouski-elastic 6630be7
Update minimal supported version and add Hugging Face transport versi…
Jan-Kazlouski-elastic 1efb2ee
Made modelId field optional in HuggingFaceChatCompletionModel, update…
Jan-Kazlouski-elastic 61537d0
Removed max input tokens field from HuggingFaceChatCompletionServiceS…
Jan-Kazlouski-elastic 64c0685
Removed if statement checking TransportVersion for HuggingFaceChatCom…
Jan-Kazlouski-elastic 4688901
Removed getFirst() method calls for backport compatibility
Jan-Kazlouski-elastic bfc8072
Made HuggingFaceChatCompletionServiceSettingsTests extend AbstractBWC…
Jan-Kazlouski-elastic 13ef13b
Refactored tests to use stripWhitespace method for readability
Jan-Kazlouski-elastic 129caaf
Refactored javadoc for HuggingFaceService
Jan-Kazlouski-elastic 214de5f
Renamed HF chat completion TransportVersion constant names
Jan-Kazlouski-elastic d3411d6
Added random string generation in unit test
Jan-Kazlouski-elastic e170b96
Refactored javadocs for HuggingFace requests
Jan-Kazlouski-elastic 473dee6
Refactored tests to reduce duplication
Jan-Kazlouski-elastic cb03100
Added changelog file
Jan-Kazlouski-elastic c856853
Merge remote-tracking branch 'origin/main' into feature/hugging-face-…
Jan-Kazlouski-elastic bd2e601
Merge remote-tracking branch 'refs/remotes/origin/main' into feature/…
Jan-Kazlouski-elastic aae528a
Add HuggingFaceChatCompletionResponseHandler and associated tests
Jan-Kazlouski-elastic 82f8049
Refactor error handling in HuggingFaceServiceTests to standardize err…
Jan-Kazlouski-elastic b0679d5
Merge remote-tracking branch 'origin/main' into feature/hugging-face-…
Jan-Kazlouski-elastic 2fa3dff
Merge remote-tracking branch 'origin/main' into feature/hugging-face-…
Jan-Kazlouski-elastic cdb3c1c
Refactor HuggingFace error handling to improve response structure and…
Jan-Kazlouski-elastic 9370b57
Merge remote-tracking branch 'origin/main' into feature/hugging-face-…
Jan-Kazlouski-elastic 9044bee
Allowing null function name for hugging face models
jonathan-buttner e72a312
Merge branch 'main' of https://github.com/Jan-Kazlouski-elastic/elast…
Jan-Kazlouski-elastic e2cb334
Merge branch 'main' of https://github.com/Jan-Kazlouski-elastic/elast…
Jan-Kazlouski-elastic a4b5d2c
Merge branch 'main' of https://github.com/Jan-Kazlouski-elastic/elast…
Jan-Kazlouski-elastic c5988ed
Merge branch 'main' of https://github.com/Jan-Kazlouski-elastic/elast…
Jan-Kazlouski-elastic 1547559
Merge branch 'main' of https://github.com/Jan-Kazlouski-elastic/elast…
Jan-Kazlouski-elastic 71c6057
Merge branch 'main' into feature/hugging-face-chat-completion-integra…
Jan-Kazlouski-elastic 228fffa
Merge branch 'main' of https://github.com/Jan-Kazlouski-elastic/elast…
Jan-Kazlouski-elastic File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
pr: 127254 | ||
summary: "[ML] Add HuggingFace Chat Completion support to the Inference Plugin" | ||
area: Machine Learning | ||
type: enhancement | ||
issues: [] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
171 changes: 171 additions & 0 deletions
171
...search/xpack/inference/services/huggingface/HuggingFaceChatCompletionResponseHandler.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,171 @@ | ||
/* | ||
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one | ||
* or more contributor license agreements. Licensed under the Elastic License | ||
* 2.0; you may not use this file except in compliance with the Elastic License | ||
* 2.0. | ||
*/ | ||
|
||
package org.elasticsearch.xpack.inference.services.huggingface; | ||
|
||
import org.elasticsearch.core.Nullable; | ||
import org.elasticsearch.rest.RestStatus; | ||
import org.elasticsearch.xcontent.ConstructingObjectParser; | ||
import org.elasticsearch.xcontent.ParseField; | ||
import org.elasticsearch.xcontent.XContentFactory; | ||
import org.elasticsearch.xcontent.XContentParser; | ||
import org.elasticsearch.xcontent.XContentParserConfiguration; | ||
import org.elasticsearch.xcontent.XContentType; | ||
import org.elasticsearch.xpack.core.inference.results.UnifiedChatCompletionException; | ||
import org.elasticsearch.xpack.inference.external.http.HttpResult; | ||
import org.elasticsearch.xpack.inference.external.http.retry.ErrorResponse; | ||
import org.elasticsearch.xpack.inference.external.http.retry.ResponseParser; | ||
import org.elasticsearch.xpack.inference.external.request.Request; | ||
import org.elasticsearch.xpack.inference.services.huggingface.response.HuggingFaceErrorResponseEntity; | ||
import org.elasticsearch.xpack.inference.services.openai.OpenAiUnifiedChatCompletionResponseHandler; | ||
|
||
import java.util.Locale; | ||
import java.util.Optional; | ||
|
||
import static org.elasticsearch.core.Strings.format; | ||
|
||
/** | ||
* Handles streaming chat completion responses and error parsing for Hugging Face inference endpoints. | ||
* Adapts the OpenAI handler to support Hugging Face's simpler error schema with fields like "message" and "http_status_code". | ||
*/ | ||
public class HuggingFaceChatCompletionResponseHandler extends OpenAiUnifiedChatCompletionResponseHandler { | ||
|
||
private static final String HUGGING_FACE_ERROR = "hugging_face_error"; | ||
|
||
public HuggingFaceChatCompletionResponseHandler(String requestType, ResponseParser parseFunction) { | ||
super(requestType, parseFunction, HuggingFaceErrorResponseEntity::fromResponse); | ||
} | ||
|
||
@Override | ||
protected Exception buildError(String message, Request request, HttpResult result, ErrorResponse errorResponse) { | ||
assert request.isStreaming() : "Only streaming requests support this format"; | ||
var responseStatusCode = result.response().getStatusLine().getStatusCode(); | ||
if (request.isStreaming()) { | ||
var errorMessage = errorMessage(message, request, result, errorResponse, responseStatusCode); | ||
var restStatus = toRestStatus(responseStatusCode); | ||
return errorResponse instanceof HuggingFaceErrorResponseEntity | ||
? new UnifiedChatCompletionException( | ||
restStatus, | ||
errorMessage, | ||
HUGGING_FACE_ERROR, | ||
restStatus.name().toLowerCase(Locale.ROOT) | ||
) | ||
: new UnifiedChatCompletionException( | ||
restStatus, | ||
errorMessage, | ||
createErrorType(errorResponse), | ||
restStatus.name().toLowerCase(Locale.ROOT) | ||
); | ||
} else { | ||
return super.buildError(message, request, result, errorResponse); | ||
} | ||
} | ||
|
||
@Override | ||
protected Exception buildMidStreamError(Request request, String message, Exception e) { | ||
var errorResponse = StreamingHuggingFaceErrorResponseEntity.fromString(message); | ||
if (errorResponse instanceof StreamingHuggingFaceErrorResponseEntity streamingHuggingFaceErrorResponseEntity) { | ||
return new UnifiedChatCompletionException( | ||
RestStatus.INTERNAL_SERVER_ERROR, | ||
format( | ||
"%s for request from inference entity id [%s]. Error message: [%s]", | ||
SERVER_ERROR_OBJECT, | ||
request.getInferenceEntityId(), | ||
errorResponse.getErrorMessage() | ||
), | ||
HUGGING_FACE_ERROR, | ||
extractErrorCode(streamingHuggingFaceErrorResponseEntity) | ||
); | ||
} else if (e != null) { | ||
return UnifiedChatCompletionException.fromThrowable(e); | ||
} else { | ||
return new UnifiedChatCompletionException( | ||
RestStatus.INTERNAL_SERVER_ERROR, | ||
format("%s for request from inference entity id [%s]", SERVER_ERROR_OBJECT, request.getInferenceEntityId()), | ||
createErrorType(errorResponse), | ||
"stream_error" | ||
); | ||
} | ||
} | ||
|
||
private static String extractErrorCode(StreamingHuggingFaceErrorResponseEntity streamingHuggingFaceErrorResponseEntity) { | ||
return streamingHuggingFaceErrorResponseEntity.httpStatusCode() != null | ||
? String.valueOf(streamingHuggingFaceErrorResponseEntity.httpStatusCode()) | ||
: null; | ||
} | ||
|
||
/** | ||
* Represents a structured error response specifically for streaming operations | ||
* using HuggingFace APIs. This is separate from non-streaming error responses, | ||
* which are handled by {@link HuggingFaceErrorResponseEntity}. | ||
* An example error response for failed field validation for streaming operation would look like | ||
* <code> | ||
* { | ||
* "error": "Input validation error: cannot compile regex from schema", | ||
* "http_status_code": 422 | ||
* } | ||
* </code> | ||
*/ | ||
private static class StreamingHuggingFaceErrorResponseEntity extends ErrorResponse { | ||
private static final ConstructingObjectParser<Optional<ErrorResponse>, Void> ERROR_PARSER = new ConstructingObjectParser<>( | ||
HUGGING_FACE_ERROR, | ||
true, | ||
args -> Optional.ofNullable((StreamingHuggingFaceErrorResponseEntity) args[0]) | ||
); | ||
private static final ConstructingObjectParser<StreamingHuggingFaceErrorResponseEntity, Void> ERROR_BODY_PARSER = | ||
new ConstructingObjectParser<>( | ||
HUGGING_FACE_ERROR, | ||
true, | ||
args -> new StreamingHuggingFaceErrorResponseEntity(args[0] != null ? (String) args[0] : "unknown", (Integer) args[1]) | ||
); | ||
|
||
static { | ||
ERROR_BODY_PARSER.declareString(ConstructingObjectParser.optionalConstructorArg(), new ParseField("message")); | ||
ERROR_BODY_PARSER.declareInt(ConstructingObjectParser.optionalConstructorArg(), new ParseField("http_status_code")); | ||
|
||
ERROR_PARSER.declareObjectOrNull( | ||
ConstructingObjectParser.optionalConstructorArg(), | ||
ERROR_BODY_PARSER, | ||
null, | ||
new ParseField("error") | ||
); | ||
} | ||
|
||
/** | ||
* Parses a streaming HuggingFace error response from a JSON string. | ||
* | ||
* @param response the raw JSON string representing an error | ||
* @return a parsed {@link ErrorResponse} or {@link ErrorResponse#UNDEFINED_ERROR} if parsing fails | ||
*/ | ||
private static ErrorResponse fromString(String response) { | ||
try ( | ||
XContentParser parser = XContentFactory.xContent(XContentType.JSON) | ||
.createParser(XContentParserConfiguration.EMPTY, response) | ||
) { | ||
return ERROR_PARSER.apply(parser, null).orElse(ErrorResponse.UNDEFINED_ERROR); | ||
} catch (Exception e) { | ||
// swallow the error | ||
} | ||
|
||
return ErrorResponse.UNDEFINED_ERROR; | ||
} | ||
|
||
@Nullable | ||
private final Integer httpStatusCode; | ||
|
||
StreamingHuggingFaceErrorResponseEntity(String errorMessage, @Nullable Integer httpStatusCode) { | ||
super(errorMessage); | ||
this.httpStatusCode = httpStatusCode; | ||
} | ||
|
||
@Nullable | ||
public Integer httpStatusCode() { | ||
return httpStatusCode; | ||
} | ||
|
||
} | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the error format for the code path that calls
buildError
will be different than a mid-stream error.For example if I change the URL to be invalid for the HF endpoint (add another character or something). The error we get back doesn't include the message from HF:
If I use curl to perform a request to a URL that doesn't exist, this is returned:
I think the best solution would be to handle both error formats. What we can do is look at the token when we parse the
error
field and see if it is an object or a string. Here's an example:https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/inference/UnifiedCompletionRequest.java#L286-L292
Once we do that we can probably move the error class from here
HuggingFaceErrorResponse
and replaceHuggingFaceErrorResponseEntity
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid that is not the case when I do the testing. I just get 404 with no response body. However for different type of error it would have "error" field and that is it. During testing I populated responseMap with "error message" and received such response from Elastic:
I think it is safe to assume that errors that are returned as part of the steram from HF - will be handled inside of "buildMidStreamError" method using "StreamingHuggingFaceErrorResponseEntity.fromString".
Errors that are returned from HF NOT as stream, but need to be returned as part of the stream from our side will be handled by "buildError" method using "HuggingFaceErrorResponseEntity.fromResponse".
Unit tests are added.