[ML] Refactor OpenAI request managers #124144

jonathan-buttner · 2025-03-05T19:50:02Z

This PR demonstrates how we can remove many of the RequestManager files within the inference API. I only did this for OpenAI as a demonstration. If we're ok with the approach I can do it for the rest of the services.

…tor-openai

jonathan-buttner · 2025-03-05T20:05:20Z

.../main/java/org/elasticsearch/xpack/inference/external/action/openai/OpenAiActionCreator.java

@@ -27,6 +36,18 @@
 */
 public class OpenAiActionCreator implements OpenAiActionVisitor {
    public static final String COMPLETION_ERROR_PREFIX = "OpenAI chat completions";
+    public static final String USER_ROLE = "user";
+
+    static final ResponseHandler COMPLETION_HANDLER = new OpenAiChatCompletionResponseHandler(


This changes are basically to move the logic from the request manager files into here.

jonathan-buttner · 2025-03-05T20:05:50Z

...src/main/java/org/elasticsearch/xpack/inference/external/http/sender/BaseRequestManager.java

-        // It's possible that two inference endpoints have the same information defining the group but have different
-        // rate limits then they should be in different groups otherwise whoever initially created the group will set
-        // the rate and the other inference endpoint's rate will be ignored
-        return new EndpointGrouping(rateLimitGroup, rateLimitSettings);


We don't need to be recreating the object on each call.

jonathan-buttner · 2025-03-05T20:06:18Z

...src/main/java/org/elasticsearch/xpack/inference/external/http/sender/BaseRequestManager.java

+        this.rateLimitSettings = rateLimitSettings;
+    }
+
+    BaseRequestManager(ThreadPool threadPool, RateLimitGroupingModel rateLimitGroupingModel) {


This is a stopgap. Once all the request managers are refactored the old constructor can be removed.

jonathan-buttner · 2025-03-05T20:07:19Z

.../main/java/org/elasticsearch/xpack/inference/external/http/sender/GenericRequestManager.java

+ * This is a temporary class to use while we refactor all the request managers. After all the request managers extend
+ * this class we'll move this functionality directly into the {@link BaseRequestManager}.
+ */
+public class GenericRequestManager<T extends InferenceInputs> extends BaseRequestManager {


Once all the request managers are refactored, I envision that we'll be able to move this logic up into the base class.

jonathan-buttner · 2025-03-05T20:08:15Z

...in/java/org/elasticsearch/xpack/inference/external/http/sender/TruncatingRequestManager.java

+
+import static org.elasticsearch.xpack.inference.common.Truncator.truncate;
+
+public class TruncatingRequestManager extends BaseRequestManager {


Currently this would only be used for text embedding requests.

jonathan-buttner · 2025-03-05T20:08:42Z

.../java/org/elasticsearch/xpack/inference/external/request/openai/OpenAiEmbeddingsRequest.java

        this.truncationResult = Objects.requireNonNull(input);
        this.model = Objects.requireNonNull(model);
    }

    public HttpRequest createHttpRequest() {
-        HttpPost httpPost = new HttpPost(account.uri());
+        HttpPost httpPost = new HttpPost(model.uri());


I pushed all the account related stuff into the model since we need it there to calculate the hash anyway.

jonathan-buttner · 2025-03-05T20:09:34Z

.../org/elasticsearch/xpack/inference/services/openai/completion/OpenAiChatCompletionModel.java

        );
    }

+    public static URI buildDefaultUri() throws URISyntaxException {


public because it's used in a few tests.

jonathan-buttner · 2025-03-05T20:19:20Z

...n/inference/src/main/java/org/elasticsearch/xpack/inference/services/openai/OpenAiModel.java

@@ -62,4 +69,16 @@ public OpenAiRateLimitServiceSettings rateLimitServiceSettings() {
    }

    public abstract ExecutableAction accept(OpenAiActionVisitor creator, Map<String, Object> taskSettings);
+
+    public int rateLimitGroupingHash() {
+        return Objects.hash(rateLimitServiceSettings.modelId(), apiKey, uri);


We could probably calculate this only once, I suppose to avoid weird bugs maybe it's better to do it on every call in the event that one of the fields gets reset. They shouldn't though since they're final.

elasticsearchmachine · 2025-03-05T21:18:21Z

Pinging @elastic/ml-core (Team:ML)

prwhelan · 2025-03-06T15:05:48Z

...ference/src/main/java/org/elasticsearch/xpack/inference/services/RateLimitGroupingModel.java

+
+    public abstract int rateLimitGroupingHash();
+
+    public abstract RateLimitSettings rateLimitSettings();


Should we maybe (eventually) move this into Model, since I think everyone has RateLimitSettings anyway?

Good idea. There might be a few places like the Bedrock implementation that doesn't. I'll see if we can handle that elegantly.

* Code compiling * Removing OpenAiAccount

elasticsearchmachine · 2025-03-06T15:25:04Z

💚 Backport successful

Status	Branch	Result
✅	8.x

* Code compiling * Removing OpenAiAccount

jonathan-buttner added 2 commits March 5, 2025 14:26

Code compiling

95d228e

Merge branch 'main' of github.com:elastic/elasticsearch into ml-refac…

b850e1a

…tor-openai

jonathan-buttner added >refactoring :ml Machine learning Team:ML Meta label for the ML team auto-backport Automatically create backport pull requests when merged v8.19.0 v9.1.0 labels Mar 5, 2025

Removing OpenAiAccount

e4c8ad8

jonathan-buttner commented Mar 5, 2025

View reviewed changes

jonathan-buttner requested a review from prwhelan March 5, 2025 21:17

jonathan-buttner marked this pull request as ready for review March 5, 2025 21:17

prwhelan approved these changes Mar 6, 2025

View reviewed changes

jonathan-buttner merged commit d3fcff3 into elastic:main Mar 6, 2025
17 checks passed

jonathan-buttner deleted the ml-refactor-openai branch March 6, 2025 15:23

jonathan-buttner added a commit to jonathan-buttner/elasticsearch that referenced this pull request Mar 6, 2025

[ML] Refactor OpenAI request managers (elastic#124144)

ccf76ba

* Code compiling * Removing OpenAiAccount

jonathan-buttner mentioned this pull request Mar 6, 2025

[8.x] [ML] Refactor OpenAI request managers (#124144) #124240

Merged

elasticsearchmachine pushed a commit that referenced this pull request Mar 6, 2025

[ML] Refactor OpenAI request managers (#124144) (#124240)

2c937ec

* Code compiling * Removing OpenAiAccount

jonathan-buttner mentioned this pull request Mar 10, 2025

[ML] Remove Voyageai request manager classes #124512

Merged

georgewallace pushed a commit to georgewallace/elasticsearch that referenced this pull request Mar 11, 2025

[ML] Refactor OpenAI request managers (elastic#124144)

1c9c8e9

* Code compiling * Removing OpenAiAccount

costin pushed a commit to costin/elasticsearch that referenced this pull request Mar 15, 2025

[ML] Refactor OpenAI request managers (elastic#124144)

bc54ac5

* Code compiling * Removing OpenAiAccount

This was referenced Apr 24, 2025

Add Hugging Face Chat Completion support to Inference Plugin #127254

Merged

Extend huggingface with rerank #127297

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Refactor OpenAI request managers #124144

[ML] Refactor OpenAI request managers #124144

jonathan-buttner commented Mar 5, 2025

jonathan-buttner Mar 5, 2025

jonathan-buttner Mar 5, 2025

jonathan-buttner Mar 5, 2025

jonathan-buttner Mar 5, 2025

jonathan-buttner Mar 5, 2025

jonathan-buttner Mar 5, 2025

jonathan-buttner Mar 5, 2025

jonathan-buttner Mar 5, 2025

elasticsearchmachine commented Mar 5, 2025

prwhelan Mar 6, 2025

jonathan-buttner Mar 6, 2025

elasticsearchmachine commented Mar 6, 2025


		import static org.elasticsearch.xpack.inference.common.Truncator.truncate;

		public class TruncatingRequestManager extends BaseRequestManager {


		public abstract int rateLimitGroupingHash();

		public abstract RateLimitSettings rateLimitSettings();

[ML] Refactor OpenAI request managers #124144

[ML] Refactor OpenAI request managers #124144

Conversation

jonathan-buttner commented Mar 5, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elasticsearchmachine commented Mar 5, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elasticsearchmachine commented Mar 6, 2025

💚 Backport successful