Unable to convert Pytorch Tensor to Cupy Array
See original GitHub issueDescription
We want to convert a Pytorch Tensor on a GPU to Cupy Array format, but when the Pytorch Tensor has a gradient or is itself a bool type, the conversion fails with an error.
We have tried using both cupy.asarray() and DLpack directly from the link below and both have failed. https://docs.cupy.dev/en/stable/user_guide/interoperability.html#pytorch
The error is: TypeError: can’t convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
We know it can be solved by .cpu().numpy(), but this method is slow and we want to implement GPU Pytorch Tensor to transfer directly to GPU Cupy.
To Reproduce
# Test Bool
import torch
import cupy
a = torch.tensor([1.1], dtype = torch.bool, device = 'cuda')
cupy.asarray(a)
# Test gradient tensor
import torch
import cupy
c = torch.tensor([1,1], device = 'cuda', dtype = torch.float, requires_grad = True)
cupy.asarray(c)
Installation
Conda-Forge (conda install ...
)
Environment
# Paste the output here
OS : Linux-5.4.0-89-generic-x86_64-with-glibc2.27
Python Version : 3.9.13
CuPy Version : 11.2.0
CuPy Platform : NVIDIA CUDA
NumPy Version : 1.23.1
SciPy Version : 1.8.1
Cython Build Version : 0.29.32
Cython Runtime Version : None
CUDA Root : /home/lthpc/.conda/envs/wjpytorch
nvcc PATH : None
CUDA Build Version : 11020
CUDA Driver Version : 11070
CUDA Runtime Version : 11060
cuBLAS Version : (available)
cuFFT Version : 10600
cuRAND Version : 10209
cuSOLVER Version : (11, 3, 2)
cuSPARSE Version : (available)
NVRTC Version : (11, 6)
Thrust Version : 101000
CUB Build Version : 101000
Jitify Build Version : 343be31
cuDNN Build Version : None
cuDNN Version : None
NCCL Build Version : None
NCCL Runtime Version : None
cuTENSOR Version : None
cuSPARSELt Build Version : None
Device 0 Name : NVIDIA A100 80GB PCIe
Device 0 Compute Capability : 80
Device 0 PCI Bus ID : 0000:18:00.0
Device 1 Name : NVIDIA A100 80GB PCIe
Device 1 Compute Capability : 80
Device 1 PCI Bus ID : 0000:3B:00.0
Device 2 Name : NVIDIA A100 80GB PCIe
Device 2 Compute Capability : 80
Device 2 PCI Bus ID : 0000:86:00.0
Device 3 Name : NVIDIA A100 80GB PCIe
Device 3 Compute Capability : 80
Device 3 PCI Bus ID : 0000:AF:00.0
Additional Information
No response
Issue Analytics
- State:
- Created a year ago
- Comments:12 (6 by maintainers)
Top Results From Across the Web
Convert torch tensors directly to cupy tensors? - PyTorch Forums
I know jumping through the conversion hoops with cupy.array(torch_tensor.cpu().numpy()) is one option, but since the tensor is already in ...
Read more >Auto-convert GPU arrays that support the ... - GitHub
To be clear, my objective here isn't to connect PyTorch and CuPy, it's to start establishing a convention to connect any pair of...
Read more >can't convert CUDA tensor to numpy. Use Tensor.cpu() to copy ...
This means data is first moved to cpu and then converted to numpy array. Share.
Read more >Interoperability — CuPy 11.4.0 documentation
The only caveat is PyTorch by default creates CPU tensors, which do not have the ... convert a torch tensor to a cupy...
Read more >CuPy Documentation - Read the Docs
CuPy easily integrates with NumPy, PyTorch, TensorFlow, MPI4Py, and any other libraries ... convert a torch tensor to a cupy array.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@Weigaa Think of
asarray
as a C++ copy constructor. A copy will occur in most cases, unless specific conditions are met, for example if the input object is also a CuPy array or is consumed via DLPack/CAI zero-copy, in which case H2D copies are by default not permitted (same withfrom_dlpack
that implements the DLPack zero-copy protocol).The latter case should work by
cupy.from_dlpack(c.detach())
.