Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

os.environ["CUDA_VISIBLE_DEVICES"] = "" does not use CPU

See original GitHub issue

🐛 Bug Report

Even we set os.environ["CUDA_VISIBLE_DEVICES"] ="", the model is still trained on the GPU. This contradicts to the documentation.

How To Reproduce

I simply copy the linear regression minimal example here

Code sample

# I added these first two lines.
import os
os.environ["CUDA_VISIBLE_DEVICES"] = ""


import torch
from torch.utils.data import DataLoader, TensorDataset
from catalyst import dl


# data
num_samples, num_features = int(1e4), int(1e1)
X, y = torch.rand(num_samples, num_features), torch.rand(num_samples)
dataset = TensorDataset(X, y)
loader = DataLoader(dataset, batch_size=32, num_workers=1)
loaders = {"train": loader, "valid": loader}

# model, criterion, optimizer, scheduler
model = torch.nn.Linear(num_features, 1)
criterion = torch.nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters())
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, [3, 6])

# model training
runner = dl.SupervisedRunner()
runner.train(
    model=model,
    criterion=criterion,
    optimizer=optimizer,
    scheduler=scheduler,
    loaders=loaders,
    logdir="./logdir",
    valid_loader="valid",
    valid_metric="loss",
    minimize_valid_metric=True,
    num_epochs=4,
    verbose=True
)

# check model on GPU or CPU
print('Is model on GPU? ', next(model.parameters()).is_cuda)

Screenshots

Ouput

Is model on GPU?  True

More info

Since the loader is on CPU, calling runner.predict_batch(next(iter(loader))) leads to the error below:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat1 in method wrapper_addmm)

Environment

Please copy and paste the output from our environment collection script

catalyst-contrib --collect-env
# or manually
wget https://raw.githubusercontent.com/catalyst-team/catalyst/master/catalyst/contrib/scripts/collect_env.py
python collect_env.py

(or fill out the checklist below manually).

# example checklist, fill with your info
Catalyst version: 20.04.
PyTorch version: 1.11.0
Python version: 3.9
CUDA runtime version: 11.4
Nvidia driver version: 472.39
cuDNN version: No CUDA

Additional context

If we set cpu=True in runner.train, then the model indeed lies on the CPU.

Issue Analytics

State:
Created a year ago
Comments:5 (2 by maintainers)

Top GitHub Comments

1reaction

ShuhuaGaocommented, Jun 28, 2022

@Scitator . Thanks. I figured it out.

Moving to the first separate cell in Jupyter does not help

It was because I did not restart the kernel (see StackOverflow).

So the keys are

Set os.environ["CUDA_VISIBLE_DEVICES"] before import torch or any other import that may bring in torch. A safe way is to put it in the first cell.
If os.environ["CUDA_VISIBLE_DEVICES"] is reset, e.g., switching from CPU to GPU, we need to restart the Jupyter kernel.

0reactions

Scitatorcommented, Jun 28, 2022

what about CPUEngine?