`torch.cuda.current_device()` is changed by CuPy after 10.0
See original GitHub issueDescription
Recently, @Yanqi-Chen reports a bug with using a Pytorch module accelerated by CuPy when training with Distributed Data Parallel (DDP).
In DDP training, each process uses torch.cuda.current_device()
as its default device. But he finds that CuPy will change torch.cuda.current_device()
. For example, when training with 4 GPUs, torch.cuda.current_device()
should be 0, 1, 2, 3
in each process. But after using CuPy, all processes’s torch.cuda.current_device()
will output 0
.
To Reproduce
I run the following codes to reproduce this problem:
import torch
import cupy
kernel_code = r'''
extern "C" __global__
void relu(const float* x, float *y, const int &N)
{
const int index = blockIdx.x * blockDim.x + threadIdx.x;
if (index < N)
{
y[index] = (float) (x[index] >= 0.0f);
}
}
'''
def relu(x: torch.Tensor):
device_id = x.get_device()
torch.cuda.set_device(device_id)
print('1:', torch.cuda.current_device())
y = torch.zeros_like(x)
assert device_id >= 0
with cupy.cuda.Device(device_id):
kernel = cupy.RawKernel(kernel_code, 'relu')
threads = 1024
N = x.numel()
blocks = (N + threads - 1) // threads
x = x.contiguous()
y = y.contiguous()
N = cupy.asarray(N)
kernel((blocks,), (threads,), (x.data_ptr(), y.data_ptr(), N))
print('2:', torch.cuda.current_device())
return y
device = 'cuda:1'
x = torch.rand([8], device=device) - 0.5
y = relu(x)
print(f'x={x}')
print(f'y={y}')
In machine A, I get outputs:
(pytorch-env) wfang@ubuntu:~/temp_dir$ python test.py
1: 1
2: 0
x=tensor([-0.1473, -0.3093, -0.0547, -0.1389, 0.4446, -0.3286, -0.4435, 0.0105],
device='cuda:1')
y=tensor([0., 0., 0., 0., 1., 0., 0., 1.], device='cuda:1')
You can find that the default device is changed from 1 to 0 after with cupy.cuda.Device(device_id)
.
However, in machine B, I get outputs:
(pytorch-env) wfang@Precision-5820-Tower-X-Series:~/tempdir$ python test.py
1: 1
2: 1
x=tensor([ 0.0060, 0.0141, 0.4118, -0.4813, 0.4609, -0.3557, 0.3739, -0.3464],
device='cuda:1')
y=tensor([1., 1., 1., 0., 1., 0., 1., 0.], device='cuda:1')
In machine B, the default device is not changed.
Installation
Source (pip install cupy
)
Environment
In machine A:
(pytorch-env) wfang@ubuntu:~/temp_dir$ conda list torch
# packages in environment at /home/wfang/anaconda3/envs/pytorch-env:
#
# Name Version Build Channel
pytorch 1.10.1 py3.9_cuda11.3_cudnn8.2.0_0 pytorch
pytorch-mutex 1.0 cuda pytorch
torchaudio 0.10.1 py39_cu113 pytorch
torchvision 0.11.2 py39_cu113 pytorch
(pytorch-env) wfang@ubuntu:~/temp_dir$ conda list cu
# packages in environment at /home/wfang/anaconda3/envs/pytorch-env:
#
# Name Version Build Channel
cudatoolkit 11.3.1 h2bc3f7f_2 defaults
cupy-cuda113 10.2.0 pypi_0 pypi
ncurses 6.3 h7f8727e_2 defaults
(pytorch-env) wfang@ubuntu:~/temp_dir$ nvidia-smi
Sun Mar 20 19:41:32 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.74 Driver Version: 470.74 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A100-SXM... On | 00000000:18:00.0 Off | 0 |
| N/A 56C P0 241W / 400W | 13828MiB / 81251MiB | 81% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA A100-SXM... On | 00000000:3B:00.0 Off | 0 |
| N/A 61C P0 218W / 400W | 13830MiB / 81251MiB | 97% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA A100-SXM... On | 00000000:86:00.0 Off | 98 |
| N/A 56C P0 222W / 400W | 13828MiB / 81251MiB | 96% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 3 NVIDIA A100-SXM... On | 00000000:AF:00.0 Off | 0 |
| N/A 57C P0 215W / 400W | 13828MiB / 81251MiB | 88% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 20830 C ...vs/pytorch-env/bin/python 13817MiB |
| 1 N/A N/A 20831 C ...vs/pytorch-env/bin/python 13819MiB |
| 2 N/A N/A 20832 C ...vs/pytorch-env/bin/python 13817MiB |
| 3 N/A N/A 20833 C ...vs/pytorch-env/bin/python 13817MiB |
+-----------------------------------------------------------------------------+
(pytorch-env) wfang@ubuntu:~/temp_dir$ gpustat
ubuntu Sun Mar 20 19:44:07 2022 470.74
[0] NVIDIA A100-SXM-80GB | 58'C, 100 % | 13828 / 81251 MB | wfang(13817M)
[1] NVIDIA A100-SXM-80GB | 63'C, 81 % | 13830 / 81251 MB | wfang(13819M)
[2] NVIDIA A100-SXM-80GB | 57'C, 91 % | 13828 / 81251 MB | wfang(13817M)
[3] NVIDIA A100-SXM-80GB | 59'C, 81 % | 13828 / 81251 MB | wfang(13817M)
(pytorch-env) wfang@ubuntu:/usr/local/cuda/bin$ ./nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_May__3_19:15:13_PDT_2021
Cuda compilation tools, release 11.3, V11.3.109
Build cuda_11.3.r11.3/compiler.29920130_0
In machine B:
(pytorch-env) wfang@Precision-5820-Tower-X-Series:~/tempdir$ conda list torch
# packages in environment at /home/wfang/anaconda3/envs/pytorch-env:
#
# Name Version Build Channel
pytorch 1.10.1 py3.9_cuda11.3_cudnn8.2.0_0 pytorch
pytorch-mutex 1.0 cuda pytorch
torch-tb-profiler 0.3.1 pypi_0 pypi
torchaudio 0.10.1 py39_cu113 pytorch
torchvision 0.11.2 py39_cu113 pytorch
(pytorch-env) wfang@Precision-5820-Tower-X-Series:~/tempdir$ conda list cu
# packages in environment at /home/wfang/anaconda3/envs/pytorch-env:
#
# Name Version Build Channel
cudatoolkit 11.3.1 h2bc3f7f_2 defaults
cupy-cuda111 9.4.0 pypi_0 pypi
ncurses 6.2 h58526e2_4 conda-forge
(pytorch-env) wfang@Precision-5820-Tower-X-Series:~/tempdir$ nvidia-smi
Sun Mar 20 19:42:23 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.19.01 Driver Version: 465.19.01 CUDA Version: 11.3 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:17:00.0 Off | N/A |
| 19% 36C P8 16W / 250W | 1448MiB / 11011MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... On | 00000000:B3:00.0 Off | N/A |
| 18% 36C P8 21W / 250W | 3MiB / 11019MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2335870 C python 1445MiB |
+-----------------------------------------------------------------------------+
(pytorch-env) wfang@Precision-5820-Tower-X-Series:~/tempdir$ gpustat
Precision-5820-Tower-X-Series Sun Mar 20 19:44:46 2022 465.19.01
[0] NVIDIA GeForce RTX 2080 Ti | 36'C, 0 % | 1448 / 11011 MB | wfang(1445M)
[1] NVIDIA GeForce RTX 2080 Ti | 36'C, 0 % | 3 / 11019 MB |
(pytorch-env) wfang@Precision-5820-Tower-X-Series:/usr/local/cuda/bin$ ./nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Mar_21_19:15:46_PDT_2021
Cuda compilation tools, release 11.3, V11.3.58
Build cuda_11.3.r11.3/compiler.29745058_0
Additional Information
No response
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (2 by maintainers)
Top GitHub Comments
Remove this line and it should work:
The “current device” is semantics provided by CUDA and not by each library.
torch.cuda.set_device()
will change the current device of the current thread, so it will take effect on CuPy as well. Mixing multiple libraries to switch the current device may cause unexpected behavior.OK, thanks!