Skip to content

Create the model Triton #1571

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: v1.0
Choose a base branch
from

Conversation

RobinPicard
Copy link
Contributor

This is a draft PR to validate the approach used, addresses issue #1551

This PR creates an Outlines model to integrate with Triton servers running a TensorRT-LLM engine. The model is made to integrate with such servers that rely on a ensemble that contains preprocessing and postprocessing such that the endpoint can receive simple http requests. For instance:

curl -X POST localhost:8000/v2/models/ensemble/generate -d '{"text_input": "What is machine learning?", "max_tokens": 20, "bad_words": "", "stop_words": ""}'

The request above is taken from the README of the tensorrtllm_backend repository

@RobinPicard RobinPicard requested a review from rlouf May 9, 2025 08:48
@rlouf
Copy link
Member

rlouf commented May 9, 2025

Is there a way we could use triton's Python client?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants