deploying model using trtis is much slower than using frozen model directly
See original GitHub issueafter optimizing yolov3 using tf-tensorrt optimization, i infer in two way.
- using trtis (~20fps) + bellow warning
layout failed: Invalid argument: The graph is already optimized by layout optimizer.
- using frozen directly (~54 fps)
Issue Analytics
- State:
- Created 4 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
deploying model using trtis is much slower than using frozen ...
It seems that using the direct optimizer on trtis is available in recent versions. I didn't update documents so I don't know yet....
Read more >Deploying Models from TensorFlow Model Zoo Using NVIDIA ...
Use the following code examples to optimize your TensorFlow network using TF-TRT, depending on your platform.
Read more >Optimizing TensorFlow Models for Serving | by Lukman Ramsey
There are several techniques in TensorFlow that allow you to shrink the size of a model and improve prediction latency. Here are some...
Read more >There are two very different ways to deploy ML models, here's ...
In this article, I'll provide you with a straightforward yet best-practices template for both kinds of deployment. As always, for the ...
Read more >Tensorflow inference becomes slower and slower due to eval ...
So I have a frozen tensorflow model, which I can use to classify images. When I try to use this model to inference...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Are you actually providing a labels.txt? Given the dimension of those outputs the file would need to have 10647*80=850k+ entries… but that isn’t related to your issue.
So you are using the offline TF-TRT conversion to create a graphdef, and then running that in TRTIS and also using some script that uses the TF API to load and run it directly?
What version of TRTIS are you using? Have you tried using perf_client with TRTIS and see what performance that gives you for various concurrency levels? Make sure you read this section of the documentation: https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/optimization.html
Closing. Please reopen with perf_client results if those still show a problem