Create the model Triton #1571

RobinPicard · 2025-05-09T08:48:04Z

This is a draft PR to validate the approach used, addresses issue #1551

This PR creates an Outlines model to integrate with Triton servers running a TensorRT-LLM engine. The model is made to integrate with such servers that rely on a ensemble that contains preprocessing and postprocessing such that the endpoint can receive simple http requests. For instance:

curl -X POST localhost:8000/v2/models/ensemble/generate -d '{"text_input": "What is machine learning?", "max_tokens": 20, "bad_words": "", "stop_words": ""}'

The request above is taken from the README of the tensorrtllm_backend repository

rlouf · 2025-05-09T10:45:18Z

Is there a way we could use triton's Python client?

Draft

c77e2d7

RobinPicard requested a review from rlouf May 9, 2025 08:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Create the model Triton #1571

Create the model Triton #1571

Uh oh!

RobinPicard commented May 9, 2025

Uh oh!

rlouf commented May 9, 2025

Uh oh!

Uh oh!

Create the model Triton #1571

Are you sure you want to change the base?

Create the model Triton #1571

Uh oh!

Conversation

RobinPicard commented May 9, 2025

Uh oh!

rlouf commented May 9, 2025

Uh oh!

Uh oh!