question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

CUDA 10.2 faster than 11 on older hardware

See original GitHub issue

While testing #1129 on the ADS side on bireli, we found this weird behavior. Downgrading CUDA from 11.1 to 10.2 speeds up inference (almost twice as fast).

bireli has a GeForce GTX TITAN X (2014). This is the output of time on our quick integrity test on bireli:

CUDA 11.1:

real    0m16.210s
user    0m52.484s
sys     0m3.251s

CUDA 10.2:

real    0m9.065s
user    0m6.337s
sys     0m1.478s

The performance is different with newer hardware, e.g. here is on romane, which has RTX A6000 (2020):

CUDA 11.1:

real    0m8.021s
user    0m12.308s
sys     0m2.939s

My guess:

so, no it’s not as slow. I’m guessing this is because bireli’s gpus are older. It seems a lot of people reported slower inference time with CUDA 11 in comparison with 10.2: https://github.com/pytorch/pytorch/issues/47908 CUDA 11.1 is shipped with cudnn 8 whereas CUDA 10.2 is used from an older cudnn 7. It could be related to the cudnn version and if torch.backends.cudnn.benchmark == True, which seems to be the case for ivadomed, e.g. here: https://github.com/ivadomed/ivadomed/blob/7b76bf81a025cde3096fd1d686d6f3c0b8ce8f02/ivadomed/main.py#L28 according to this issue

However, setting cudnn.benchmark = True to False in main.py and testing.py, I wasn’t able to observe a meaningful time difference but maybe I messed something up.

With this being said, this is low-priority stuff IMO because it shouldn’t have any impact on newer hardware. Just wanted to let everyone know about this.

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:8 (8 by maintainers)

github_iconTop GitHub Comments

2reactions
hermancollincommented, Jun 13, 2022

I believe the issue here is that even tho I can downgrade cuda on bireli, romane should still be way faster regardless of the cuda version used because its gpus are much newer

I guess there are a lot of factors associated with it, even batch_size while inference makes a diff (comment upstream)

Hm very curious. The complexity of this issue just keeps getting worse as we dig into this can of worms.

The reason I said the quick benchmark I did wasn’t rigorous enough is that if you rely only on time, you need to run the function multiple times and average the durations to get a (somewhat) accurate representation.

With that being said, I found something unexpected while profiling with CUDA 10 vs 11.

CUDA 10.2 on bireli

cuda10

CUDA 11 on bireli

cuda11

I know it’s pretty hard to see anything on these (you can click on the image and the hover tooltip gives you more info). We were internally aware of performance difference when using onnx vs pt models. From what I can see, CUDA 10 uses the pytorch model whereas CUDA 11 uses the onnx version.

I think this is probably the main reason why the performance is so volatile.

1reaction
kanishk16commented, Jun 13, 2022

I believe the issue here is that even tho I can downgrade cuda on bireli, romane should still be way faster regardless of the cuda version used because its gpus are much newer

IIUC the benchmarks reported in this issue are related to inference… Maybe we would observe some improvement in the training time as the newer GPUs reduce the training time, at least this is what the benchmarks portray.

Read more comments on GitHub >

github_iconTop Results From Across the Web

The speed of pytorch with cudatoolkit 11.0 is slower than ...
Simplify specifying torch dependency axondeepseg/axondeepseg#642. Merged. CUDA 10.2 faster than 11 on older hardware ivadomed/ivadomed#1141.
Read more >
Difference between versions 9.2,10.1,10.2,11.0 of cuda for ...
Note that CUDA itself has different features between versions. That means some pytorch operations may be faster in the 11.1 version than the...
Read more >
CUDA Compatibility :: NVIDIA Data Center GPU Driver ...
CUDA Compatibility document describes the use of new CUDA toolkit components on systems with older base installations.
Read more >
Check cuda version windows 10 cmd
And "nvidia-smi" says I am using CUDA 10.2. ... Web best remote control car for 13 year old WebWebJul 28, 2020 · wsl2...
Read more >
NVIDIA CUDA 11 Now Available | Exxact Blog
CUDA 11 enables you to leverage the new hardware capabilities to ... code compiled entirely against CUDA 10.2 or older will disable the ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found