question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

deform_conv2d, CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`

See original GitHub issue

🐛 Bug

Opening a new issue for the bug reported here, I could reproduce as well on nighlty build: https://github.com/pytorch/vision/issues/2598#issuecomment-896921180

Thanks to @Queuecumber

1.10.0.dev20210726+cu111 0.11.0a0+c51f8c1
Total memory used before DFC call: 5.321267579510999%
Traceback (most recent call last):
  File "repro_vision_2598.py", line 58, in <module>
    test_out = dfc(test_in, test_offset)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1060, in _call_impl
    return forward_call(*input, **kwargs)
  File "repro_vision_2598.py", line 40, in forward
    res = deform_conv2d(input=x, offset=offset, weight=self.weight, stride=_pair(self.stride), padding=_pair(self.padding), dilation=_pair(self.dilation), mask=mask)
  File "/vision/torchvision/ops/deform_conv.py", line 89, in deform_conv2d
    return torch.ops.torchvision.deform_conv2d(
RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`

Environment

  • PyTorch / torchvision Version (e.g., 1.0 / 0.4.0):
  • OS (e.g., Linux):
  • How you installed PyTorch / torchvision (conda, pip, source):
  • Build command you used (if compiling from source):
  • Python version:
  • CUDA/cuDNN version:
  • GPU models and configuration:
  • Any other relevant information:

Additional context

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:12 (1 by maintainers)

github_iconTop GitHub Comments

2reactions
ngimelcommented, Aug 12, 2021

I still couldn’t repro it on a 16 GB card (was getting honest OOMs), but I thinks what’s happening is for bs=23 columns has more than 2**31 elements https://github.com/pytorch/vision/blob/7d52be76c8eaf02b12338afe0822396ab3547fe2/torchvision/csrc/ops/cuda/deform_conv2d_kernel.cu#L1079-L1080, (217055232, to be exact), and the im2col kernel is using int32 addressing, so some address computations are overflowing. The fix would be to either make the kernel templated and use int64 index computation when necessary, or instead of limiting n_parallel_ings to const kMaxParallelImgs compute n_parallel_imgs in such a way so the columns has fewer than 2**31 elements.

1reaction
Queuecumbercommented, Sep 23, 2021

Seems to be working, thanks a lot for the fix and sorry for my late reply

Read more comments on GitHub >

github_iconTop Results From Across the Web

ops.deform_conv2d causes CUDA illegal memory access #2598
Bug I try to test the speed of deformable conv2d. But always encountered memory error. To Reproduce $ ipython Python 3.8.5 (default, ...
Read more >
DeformConv2d — Torchvision main documentation - PyTorch
Parameters: input (Tensor[batch_size, in_channels, in_height, in_width]) – input tensor. offset (Tensor[batch_size, 2 * offset_groups * kernel_height * ...
Read more >
deform_conv2d, CUBLAS_STATUS_ALLOC_FAILED when ...
in the beginning of the script to initialize cublas handle in advance, then my script errors out with plain OOM.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found