question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`import kornia` break CUDA lazy init

See original GitHub issue

Describe the bug

Hi there. Thank you for providing a great library!

I found a bug related to CUDA initialization in kornia==0.6.8.

This bug happened using the “CUDA_VISIBLE_DEVICES” environment value in Python script.

Reproduction steps

Below is my workstation.

$ nvidia-smi -L
GPU 0: NVIDIA A100-SXM4-40GB (UUID: GPU-749126bf-d321-4567-7c2d-6b03eced6bb6)
GPU 1: NVIDIA A100-SXM4-40GB (UUID: GPU-f326dc87-2766-a6cd-2c76-9d3860157f88)
GPU 2: NVIDIA A100-SXM4-40GB (UUID: GPU-feae10c6-1b2e-88e2-19a2-d1787eb645b6)
GPU 3: NVIDIA A100-SXM4-40GB (UUID: GPU-343edb27-9d92-2038-65fd-5596aff0bf33)
GPU 4: NVIDIA A100-SXM4-40GB (UUID: GPU-83bd83b6-f178-0b19-6216-10e8129a67ee)
GPU 5: NVIDIA A100-SXM4-40GB (UUID: GPU-4e238c07-cda8-5fdd-ae0b-b961e4bb0b29)
GPU 6: NVIDIA A100-SXM4-40GB (UUID: GPU-0388c595-fb53-1000-e6c3-b226f77b8a21)
GPU 7: NVIDIA A100-SXM4-40GB (UUID: GPU-fed61f6c-f856-e1ff-4705-539a72e61848)

Then, I try below script. This script is used to change the GPU in script.

import kornia
import os
import torch

os.environ["CUDA_VISIBLE_DEVICES"] = "32"

print(torch.__version__, torch.cuda.device_count())

Output is here.

$ python test.py
1.12.0+cu113 8

Expected behavior

The following output is correct.

$ python test.py
1.12.0+cu113 0

“CUDA_VISIBLE_DEVICES” environment value is used to switch or limit GPUs in process. NVIDIA reference is here This environment value is used below.

CUDA_VISIBLE_DEVICES="1,2" python train.py

or

import os

os.environ["CUDA_VISIBLE_DEVICES"]="1"

For example, YOLOv5 uses the following method for GPU selection. https://github.com/ultralytics/yolov5/pull/8497

The same bug also occurred in PyTorch and has been fixed.(My own Issue) https://github.com/pytorch/pytorch/issues/80876

I think the library should not initialize CUDA so that the user can control it. CUDA is initialized when torch.cuda is executed.

The bug is caused by this part.

https://github.com/kornia/kornia/blob/db988652b13f46ac0dfedc5f883e11a848a6ca07/kornia/feature/adalam/adalam.py#L31

https://github.com/kornia/kornia/blob/db988652b13f46ac0dfedc5f883e11a848a6ca07/kornia/utils/helpers.py#L12-L28

We can fix it like PyTorch official repo. https://github.com/pytorch/pytorch/pull/80899/files

- if torch.cuda.is_available():
+ # nvFuser imports are conditional on being compiled with CUDA
+ if hasattr(torch._C, "_nvfuser"):

If there is no problem, I will make a Pull Request as it is.

Environment

wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py
  • PyTorch Version (e.g., 1.0):
  • OS (e.g., Linux):
  • How you installed PyTorch (conda, pip, source):
  • Build command you used (if compiling from source):
  • Python version:
  • CUDA/cuDNN version:
  • GPU models and configuration:
  • Any other relevant information:
Collecting environment information...
PyTorch version: 1.12.1+cu113
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.5 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31

Python version: 3.10.5 (main, Jul  6 2022, 16:47:12) [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.4.0-125-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.4.152
GPU models and configuration:
GPU 0: NVIDIA A100-SXM4-40GB
GPU 1: NVIDIA A100-SXM4-40GB
GPU 2: NVIDIA A100-SXM4-40GB
GPU 3: NVIDIA A100-SXM4-40GB
GPU 4: NVIDIA A100-SXM4-40GB
GPU 5: NVIDIA A100-SXM4-40GB
GPU 6: NVIDIA A100-SXM4-40GB
GPU 7: NVIDIA A100-SXM4-40GB

Nvidia driver version: 470.141.03
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.5.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.5.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.5.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.5.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.5.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.5.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.5.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.23.1
[pip3] torch==1.12.1+cu113
[pip3] torch-tb-profiler==0.4.0
[pip3] torchaudio==0.12.1+cu113
[pip3] torchvision==0.13.1+cu113
[conda] Could not collect

Additional context

No response

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:6

github_iconTop GitHub Comments

1reaction
johnnv1commented, Oct 19, 2022

Not related to the problem, but the @shijianjian suggestion also will improve the import time in a bit (from ~8ms to ~0.2ms)

0reactions
ducha-aikicommented, Oct 19, 2022

@shijianjian you are right. Let me fix that in adalam

Read more comments on GitHub >

github_iconTop Results From Across the Web

[1.12] os.environ["CUDA_VISIBLE_DEVICES"] has no effect
Ok, I think I know what is going on: we call torch.cuda.is_available() during import torch , which makes all os.environ calls inefficient.
Read more >
Slow init of GPUs - deployment - PyTorch Forums
In case you are using CUDA 11.7+, you can activate lazy module loading via CUDA_MODULE_LOADING=LAZY , which will avoid pre-loading every kernel and...
Read more >
LightningDataModule - PyTorch Lightning - Read the Docs
Here's a more realistic, complex DataModule that shows how much more reusable the datamodule is. import pytorch_lightning as pl from torch.utils.data import ......
Read more >
CUDA Toolkit 12.0 Released for General Availability
Overview · NVIDIA Hopper and NVIDIA Ada Lovelace architecture support · Lazy loading · Compatibility · JIT LTO support · C++20 compiler support....
Read more >
Pytorch-XLA: Understanding TPU's and XLA | Kaggle
Let's start with importing Pytorch-XLA and necessary Modules ... XLA tensors are Lazy and their internals differ from CPU and CUDA tensors.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found