Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Not able to get 30+ fps processing speed on Nvidia RTX 2080 GPU

See original GitHub issue

Hello, first off, thank you for sharing this amazing work. Much appreciated.

I wanted to report in that I also could not get 30+fps on an Nvidia RTX 2080 GPU with 8GB RAM. I am getting 8-10fps with video and with images, I get ~16fps (0.06sec/image) with the Resnet-101 model, ~20fps (0.05sec/image) with the Resnet-50 model and 17-18fps (0.055sec/image) with the Darket53 model. This is quite impressive but its roughly 1/2 of what is reported in the paper. For images, I used the python timeit module to wrap the evalimage function to report my numbers. Also, it is weird that the difference in speed between the different models is not significant (especially between Resnet-101 and Resnet-50), which indicates to me that something is reducing the processing speed by ~1/2 for all the models.

The command I am using is as below (except I change the model name as needed):

python3 eval.py --trained_model=weights/yolact_resnet50_54_800000.pth --score_threshold=0.4 --top_k=100 --images=./test_images:./test_output_images

I also tried using --benchmark but there is no change in the numbers above.

I was wondering if I could get some help to figure this out.

Issue Analytics

State:
Created 4 years ago
Reactions:1
Comments:17 (7 by maintainers)

Top GitHub Comments

7reactions

dbolyacommented, May 10, 2019

I’m actually really glad you asked that! When I timed it, that step took a whopping 19 ms, which didn’t seem right at all.

I then narrowed it down to this line torch.Tensor(frame).float().cuda() which a full 16 ms on its own!

Turns out most of that was coming from the torch.Tensor constructor, so I changed that to torch.from_numpy(frame).float().cuda() but that still took 15 ms, most of which coming from the .float() on the CPU.

So, I once again rearranged that to get torch.from_numpy(frame).cuda().float() which took only 1 ms…

So on the current master branch, step 1 takes 19 ms, but now it’s down to 4. I’ll push this along with my new rendering code and other speed improvements probably later today. Note though that evalvideo is very multithreaded and the torch.Tensor constructor likely releases the GIL (as it’s in C++), so this doesn’t look like it had as huge an impact on evaluation (though it did take me from 28 fps on one video to 31).

1reaction

dbolyacommented, Mar 10, 2020

@Rm1n90 Idk, I haven’t tested it myself. It’ll probably be slightly faster, but not that much (maybe 10%?)

Top Results From Across the Web

NVIDIA GeForce RTX 2080 User Guide

Installing the NVIDIA GeForce graphics card hardware involves opening your computer. RTX 2080 Follow all the safety instructions provided here to ensure that ......

How to get more FPS with your Nvidia RTX GPU ... - YouTube

... steps on getting more FPS out of your Nvidia RTX graphics card. Although I'm showing the RTX 3060 Ti, it's actually possible...

POSSIBLE FIX FOR PC FRAME RATE : r/Eldenring - Reddit

Exit the game. Go to your windows bar and search "graphics". Click on "Graphics Settings". Choose desktop app and click "browse". Search through...

Realtime image processing on NVIDIA GeForce RTX 2080ti

Time for GPU processing on NVIDIA GeForce RTX 2080ti could be around 30-40 ms per frame which is faster than the maximum frame...

NVIDIA RTX 2080 and 2080 Ti review: To 4K 60 FPS, and ...

In comparison, NVIDIA's last flagship GPU, the GTX 1080 Ti, had 3,584 CUDA cores and... that's it. Even though that number is higher...