Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unable to convert Pytorch Tensor to Cupy Array

See original GitHub issue

Description

We want to convert a Pytorch Tensor on a GPU to Cupy Array format, but when the Pytorch Tensor has a gradient or is itself a bool type, the conversion fails with an error.

We have tried using both cupy.asarray() and DLpack directly from the link below and both have failed. https://docs.cupy.dev/en/stable/user_guide/interoperability.html#pytorch

The error is: TypeError: can’t convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

We know it can be solved by .cpu().numpy(), but this method is slow and we want to implement GPU Pytorch Tensor to transfer directly to GPU Cupy.

To Reproduce

# Test Bool
import torch
import cupy
a = torch.tensor([1.1], dtype = torch.bool, device = 'cuda')
cupy.asarray(a)

# Test gradient tensor
import torch
import cupy
c = torch.tensor([1,1], device = 'cuda', dtype = torch.float, requires_grad = True)
cupy.asarray(c)

Installation

Conda-Forge (conda install ...)

Environment

# Paste the output here
OS                           : Linux-5.4.0-89-generic-x86_64-with-glibc2.27
Python Version               : 3.9.13
CuPy Version                 : 11.2.0
CuPy Platform                : NVIDIA CUDA
NumPy Version                : 1.23.1
SciPy Version                : 1.8.1
Cython Build Version         : 0.29.32
Cython Runtime Version       : None
CUDA Root                    : /home/lthpc/.conda/envs/wjpytorch
nvcc PATH                    : None
CUDA Build Version           : 11020
CUDA Driver Version          : 11070
CUDA Runtime Version         : 11060
cuBLAS Version               : (available)
cuFFT Version                : 10600
cuRAND Version               : 10209
cuSOLVER Version             : (11, 3, 2)
cuSPARSE Version             : (available)
NVRTC Version                : (11, 6)
Thrust Version               : 101000
CUB Build Version            : 101000
Jitify Build Version         : 343be31
cuDNN Build Version          : None
cuDNN Version                : None
NCCL Build Version           : None
NCCL Runtime Version         : None
cuTENSOR Version             : None
cuSPARSELt Build Version     : None
Device 0 Name                : NVIDIA A100 80GB PCIe
Device 0 Compute Capability  : 80
Device 0 PCI Bus ID          : 0000:18:00.0
Device 1 Name                : NVIDIA A100 80GB PCIe
Device 1 Compute Capability  : 80
Device 1 PCI Bus ID          : 0000:3B:00.0
Device 2 Name                : NVIDIA A100 80GB PCIe
Device 2 Compute Capability  : 80
Device 2 PCI Bus ID          : 0000:86:00.0
Device 3 Name                : NVIDIA A100 80GB PCIe
Device 3 Compute Capability  : 80
Device 3 PCI Bus ID          : 0000:AF:00.0

Additional Information

No response

Issue Analytics

State:
Created a year ago
Comments:12 (6 by maintainers)

Top GitHub Comments

1reaction

leofangcommented, Nov 1, 2022

@Weigaa Think of asarray as a C++ copy constructor. A copy will occur in most cases, unless specific conditions are met, for example if the input object is also a CuPy array or is consumed via DLPack/CAI zero-copy, in which case H2D copies are by default not permitted (same with from_dlpack that implements the DLPack zero-copy protocol).

1reaction

kmaehashicommented, Oct 26, 2022

The latter case should work by cupy.from_dlpack(c.detach()).