question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

os.environ["CUDA_VISIBLE_DEVICES"] = "" does not use CPU

See original GitHub issue

🐛 Bug Report

Even we set os.environ["CUDA_VISIBLE_DEVICES"] ="", the model is still trained on the GPU. This contradicts to the documentation.

How To Reproduce

I simply copy the linear regression minimal example here

Code sample

# I added these first two lines.
import os
os.environ["CUDA_VISIBLE_DEVICES"] = ""


import torch
from torch.utils.data import DataLoader, TensorDataset
from catalyst import dl


# data
num_samples, num_features = int(1e4), int(1e1)
X, y = torch.rand(num_samples, num_features), torch.rand(num_samples)
dataset = TensorDataset(X, y)
loader = DataLoader(dataset, batch_size=32, num_workers=1)
loaders = {"train": loader, "valid": loader}

# model, criterion, optimizer, scheduler
model = torch.nn.Linear(num_features, 1)
criterion = torch.nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters())
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, [3, 6])

# model training
runner = dl.SupervisedRunner()
runner.train(
    model=model,
    criterion=criterion,
    optimizer=optimizer,
    scheduler=scheduler,
    loaders=loaders,
    logdir="./logdir",
    valid_loader="valid",
    valid_metric="loss",
    minimize_valid_metric=True,
    num_epochs=4,
    verbose=True
)

# check model on GPU or CPU
print('Is model on GPU? ', next(model.parameters()).is_cuda)

Screenshots

Ouput

Is model on GPU?  True

More info

Since the loader is on CPU, calling runner.predict_batch(next(iter(loader))) leads to the error below:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat1 in method wrapper_addmm)

Environment

Please copy and paste the output from our environment collection script

catalyst-contrib --collect-env
# or manually
wget https://raw.githubusercontent.com/catalyst-team/catalyst/master/catalyst/contrib/scripts/collect_env.py
python collect_env.py

(or fill out the checklist below manually).

# example checklist, fill with your info
Catalyst version: 20.04.
PyTorch version: 1.11.0
Python version: 3.9
CUDA runtime version: 11.4
Nvidia driver version: 472.39
cuDNN version: No CUDA

Additional context

If we set cpu=True in runner.train, then the model indeed lies on the CPU.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
ShuhuaGaocommented, Jun 28, 2022

@Scitator . Thanks. I figured it out.

Moving to the first separate cell in Jupyter does not help

It was because I did not restart the kernel (see StackOverflow).

So the keys are

  • Set os.environ["CUDA_VISIBLE_DEVICES"] before import torch or any other import that may bring in torch. A safe way is to put it in the first cell.
  • If os.environ["CUDA_VISIBLE_DEVICES"] is reset, e.g., switching from CPU to GPU, we need to restart the Jupyter kernel.
0reactions
Scitatorcommented, Jun 28, 2022

what about CPUEngine?

Read more comments on GitHub >

github_iconTop Results From Across the Web

os.environ[CUDA_VISIBLE_DEVICES] does not work well
Initially, I write this code in order to see the synchronization mechanism of parameter and buffer in multi GPU training.
Read more >
Tensorflow set CUDA_VISIBLE_DEVICES within jupyter
I tried displaying tensorflow local devices as mentioned. My system outputs only device type: CPU. Does this mean that tensorflow is not running...
Read more >
How to setting the GPU No. for training? · Issue #109
It re-writes the environment variables and makes only certain NVIDIA GPU(s) visible for that process. import os os.environ["CUDA_DEVICE_ORDER"]= ...
Read more >
Control GPU Visibility with CUDA_VISIBLE_DEVICES
I'm not 100% sure, but I believe NVIDIA_VISIBLE_DEVICES is used by the NVIDIA Docker Runtime to select GPUs visible inside a container.
Read more >
Dataparallel training (cpu, single/multi-gpu) — Catalyst 22.04 ...
If you don't want to use GPUs at all you could set CUDA_VISIBLE_DEVICES="" . In this case, do the following before your experiment...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found