Not able to get 30+ fps processing speed on Nvidia RTX 2080 GPU
See original GitHub issueHello, first off, thank you for sharing this amazing work. Much appreciated.
I wanted to report in that I also could not get 30+fps on an Nvidia RTX 2080 GPU with 8GB RAM. I am getting 8-10fps with video and with images, I get ~16fps (0.06sec/image) with the Resnet-101 model, ~20fps (0.05sec/image) with the Resnet-50 model and 17-18fps (0.055sec/image) with the Darket53 model. This is quite impressive but its roughly 1/2 of what is reported in the paper. For images, I used the python timeit module to wrap the evalimage function to report my numbers. Also, it is weird that the difference in speed between the different models is not significant (especially between Resnet-101 and Resnet-50), which indicates to me that something is reducing the processing speed by ~1/2 for all the models.
The command I am using is as below (except I change the model name as needed):
python3 eval.py --trained_model=weights/yolact_resnet50_54_800000.pth --score_threshold=0.4 --top_k=100 --images=./test_images:./test_output_images
I also tried using --benchmark but there is no change in the numbers above.
I was wondering if I could get some help to figure this out.
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:17 (7 by maintainers)
Top GitHub Comments
I’m actually really glad you asked that! When I timed it, that step took a whopping 19 ms, which didn’t seem right at all.
I then narrowed it down to this line
torch.Tensor(frame).float().cuda()
which a full 16 ms on its own!Turns out most of that was coming from the
torch.Tensor
constructor, so I changed that totorch.from_numpy(frame).float().cuda()
but that still took 15 ms, most of which coming from the.float()
on the CPU.So, I once again rearranged that to get
torch.from_numpy(frame).cuda().float()
which took only 1 ms…So on the current master branch, step 1 takes 19 ms, but now it’s down to 4. I’ll push this along with my new rendering code and other speed improvements probably later today. Note though that
evalvideo
is very multithreaded and thetorch.Tensor
constructor likely releases the GIL (as it’s in C++), so this doesn’t look like it had as huge an impact on evaluation (though it did take me from 28 fps on one video to 31).@Rm1n90 Idk, I haven’t tested it myself. It’ll probably be slightly faster, but not that much (maybe 10%?)