-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TRT inference poor performance v.s. pytorch with dino model #3398
Comments
|
If possible please use trtexec to benchmark the TRT performance, a sample command would be like |
thanks. |
I‘ve requested access. |
Check with TRT 8.6(TRT docker 23.10) on A100. the mean gpu time is 102.96ms. So this doesn't looks like the bug in TRT
|
But the performance of fp16 and tf32 is basically the same, is this normal? It doesn't seem to meet expectations very well. @zerollzeng |
Hi @chenrui17 , did you figure it out by any chance? I'm running into the same problem although it is an old issue |
train model : dino link


firstly, use mmdeploy convert pytorch model to onnx format,
secondly, use Trt builder to generate engine.
finally, use
execute_async_v2
method to inference, but result performance is too bad compared to pytorch.nsight profilling is below, forward time is about 420ms+, but pytorch infer time is about
but pytorch infer time is about 180ms, nsys files is below
my question is what is the problem ? how to further analyze the performance and optimization ?
btw, below is my trt inference code, please check. thanks.
The text was updated successfully, but these errors were encountered: