Is there any way to reduce the GPU memory usage and enhance the inference speed?
See original GitHub issueThe M-LSD’s pred_lines takes a long time than I expected, about ~6Hz (including other stuff; M-LSD-tiny only seems to be about 10Hz).
And it takes about 2G of GPU memory.
Is there a way to reduce the GPU memory usage and enhance the inference speed? (including TensorRT, etc.)
Please give me an adivce as I’m not an expert of this.
Thanks!
Issue Analytics
- State:
- Created a year ago
- Comments:6 (4 by maintainers)
Top Results From Across the Web
Memory and speed - Hugging Face
We present some techniques and ideas to optimize Diffusers inference for memory or speed. As a general rule, we recommend the use of...
Read more >Monitor and Improve GPU Usage for Training Deep Learning ...
It doesn't work in every case, but one simple way to possibly increase GPU utilization is to increase batch size. Gradients for a...
Read more >Maximizing Deep Learning Inference Performance with ...
Optimized hardware usage—Examine GPU memory requirements to run more models on less hardware. Rather than optimizing for throughput, you can use ...
Read more >How to decrease GPU Inference time and Increase its ...
Test case 1: Created a tf session using tf.Session(graph=tf. · Test case 2: Then we restricted the GPU usage for each TF session...
Read more >Improving PyTorch inference performance on GPUs with a few ...
There are a few complementary ways to achieve this in practice: use relatively wide models (where the non-batched dimensions are large), use ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

@rhysdg Thank you for the detailed explanation! Yeah, I’m looking for employment with Nvidia Jetson as well, and my personal laptops for practice as well.
It gave me a really nice insight! Thank you again!
@JinraeKim @lhwcv Apologies for the late reply, busy times! Forsure the main criteria with TensorRT is to reduce latency, and therefore increase inference speed pretty signifcantly with minimal reduction in quality at FP16. Given a successful conversion you should also see a significant reduction in memory allocation overhead.
Its worth bearing in mind that the setup I have here was developed for Jetson series devices, although my understanding is that it plays nice with Nvdia’s NGC PyTorch docker container. I am hoping to start bringing in a TensorrT Python API/ Pycuda version shortly that should work across a wider range of devices. What were you hoping to deploy with @JinraeKim?