`import kornia` break CUDA lazy init
See original GitHub issueDescribe the bug
Hi there. Thank you for providing a great library!
I found a bug related to CUDA initialization in kornia==0.6.8.
This bug happened using the “CUDA_VISIBLE_DEVICES” environment value in Python script.
Reproduction steps
Below is my workstation.
$ nvidia-smi -L
GPU 0: NVIDIA A100-SXM4-40GB (UUID: GPU-749126bf-d321-4567-7c2d-6b03eced6bb6)
GPU 1: NVIDIA A100-SXM4-40GB (UUID: GPU-f326dc87-2766-a6cd-2c76-9d3860157f88)
GPU 2: NVIDIA A100-SXM4-40GB (UUID: GPU-feae10c6-1b2e-88e2-19a2-d1787eb645b6)
GPU 3: NVIDIA A100-SXM4-40GB (UUID: GPU-343edb27-9d92-2038-65fd-5596aff0bf33)
GPU 4: NVIDIA A100-SXM4-40GB (UUID: GPU-83bd83b6-f178-0b19-6216-10e8129a67ee)
GPU 5: NVIDIA A100-SXM4-40GB (UUID: GPU-4e238c07-cda8-5fdd-ae0b-b961e4bb0b29)
GPU 6: NVIDIA A100-SXM4-40GB (UUID: GPU-0388c595-fb53-1000-e6c3-b226f77b8a21)
GPU 7: NVIDIA A100-SXM4-40GB (UUID: GPU-fed61f6c-f856-e1ff-4705-539a72e61848)
Then, I try below script. This script is used to change the GPU in script.
import kornia
import os
import torch
os.environ["CUDA_VISIBLE_DEVICES"] = "32"
print(torch.__version__, torch.cuda.device_count())
Output is here.
$ python test.py
1.12.0+cu113 8
Expected behavior
The following output is correct.
$ python test.py
1.12.0+cu113 0
“CUDA_VISIBLE_DEVICES” environment value is used to switch or limit GPUs in process. NVIDIA reference is here This environment value is used below.
CUDA_VISIBLE_DEVICES="1,2" python train.py
or
import os
os.environ["CUDA_VISIBLE_DEVICES"]="1"
For example, YOLOv5 uses the following method for GPU selection. https://github.com/ultralytics/yolov5/pull/8497
The same bug also occurred in PyTorch and has been fixed.(My own Issue) https://github.com/pytorch/pytorch/issues/80876
I think the library should not initialize CUDA so that the user can control it.
CUDA is initialized when torch.cuda
is executed.
The bug is caused by this part.
We can fix it like PyTorch official repo. https://github.com/pytorch/pytorch/pull/80899/files
- if torch.cuda.is_available():
+ # nvFuser imports are conditional on being compiled with CUDA
+ if hasattr(torch._C, "_nvfuser"):
If there is no problem, I will make a Pull Request as it is.
Environment
wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py
- PyTorch Version (e.g., 1.0):
- OS (e.g., Linux):
- How you installed PyTorch (
conda
,pip
, source): - Build command you used (if compiling from source):
- Python version:
- CUDA/cuDNN version:
- GPU models and configuration:
- Any other relevant information:
Collecting environment information...
PyTorch version: 1.12.1+cu113
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.5 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31
Python version: 3.10.5 (main, Jul 6 2022, 16:47:12) [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.4.0-125-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.4.152
GPU models and configuration:
GPU 0: NVIDIA A100-SXM4-40GB
GPU 1: NVIDIA A100-SXM4-40GB
GPU 2: NVIDIA A100-SXM4-40GB
GPU 3: NVIDIA A100-SXM4-40GB
GPU 4: NVIDIA A100-SXM4-40GB
GPU 5: NVIDIA A100-SXM4-40GB
GPU 6: NVIDIA A100-SXM4-40GB
GPU 7: NVIDIA A100-SXM4-40GB
Nvidia driver version: 470.141.03
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.5.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.5.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.5.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.5.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.5.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.5.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.5.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.23.1
[pip3] torch==1.12.1+cu113
[pip3] torch-tb-profiler==0.4.0
[pip3] torchaudio==0.12.1+cu113
[pip3] torchvision==0.13.1+cu113
[conda] Could not collect
Additional context
No response
Issue Analytics
- State:
- Created a year ago
- Comments:6
Top GitHub Comments
Not related to the problem, but the @shijianjian suggestion also will improve the import time in a bit (from ~8ms to ~0.2ms)
@shijianjian you are right. Let me fix that in adalam