CuPy to PyTorch with __cuda_array_interface__ - Array Strides not Multiple of Element Byte Size
See original GitHub issuePyTorch version:
torch.__version__
: 1.3.0
CuPy config:
CuPy Version : 7.0.0rc1
CUDA Root : /usr/local/cuda-10.0
CUDA Build Version : 10000
CUDA Driver Version : 10020
CUDA Runtime Version : 10000
cuDNN Build Version : None
cuDNN Version : None
NCCL Build Version : None
NCCL Runtime Version : None
With PyTorch accepting the cuda_array_interface, I’d expect to be able to use a CuPy generated CUDA array in PyTorch without going through DLPack.
import cupy as cp
import torch
a = cp.random.rand(10000)
b = torch.as_tensor(a)
Throws: ValueError: given array strides not a multiple of the element byte size. Make a copy of the array to reallocate the memory.
If this is on PyTorch’s side, please let me know, and I’ll file an issue there. Thanks!
Issue Analytics
- State:
- Created 4 years ago
- Comments:18 (11 by maintainers)
Top Results From Across the Web
Interoperability — CuPy 11.4.0 documentation
This enables NumPy ufuncs to be directly operated on CuPy arrays. ... so zero-copy data exchange between CuPy and PyTorch can be achieved...
Read more >CuPy Documentation - Read the Docs
If the array cannot be reshaped without copy, it raises an exception. size strides. Strides of axes in bytes. See also:.
Read more >CUDA Array Interface (Version 3) - Numba documentation
If strides is not given, or it is None , the array is in C-contiguous layout. ... the number of bytes to skip...
Read more >Numba cuda 2d array - miocittadino.it
I store the table in a 2D numpy array called ca with size (z, z). ndarray implements __cuda_array_interface__, which is the CUDA array...
Read more >Multi-Dimensional Array (ndarray) - Chainer
Currently, it does not support slices that consists of more than one boolean arrays. Note. CuPy handles out-of-bounds indices differently from NumPy.
Read more >Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Also just a small note, using the following code (based off the code in the OP), we notice something peculiar
This happens because PyTorch has actually done a device-to-host transfer under-the-hood. To fix this we have to specify the
device
.One can also inspect
__cuda_array_interface__
of both objects to confirm that this not only remains on device, but was a zero-copy conversion. Thusb
uses the same memory asa
.Confirmed fixed with https://github.com/pytorch/pytorch/pull/24947