Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

AMP wants the model to be first sent to CUDA device

See original GitHub issue

Describe the bug When training with fp16, AMP says that we need to provide a model with parameters located on CUDA.

To Reproduce The problem is reproduced in this Kernel (Version 3).

Expected behavior Expect the model to train normally

Screenshots Calling runner.train

Getting error:

When using amp.initialize, you need to provide a model with parameters
located on a CUDA device before passing it no matter what optimization level
you chose. Use model.to('cuda') to use the default device.

Additional context The problem is solved if I pass model=model.cuda() to runner.train. But I don’t think it’s designed to be done like that.

Catalyst version is 19.11.

Issue Analytics

State:
Created 4 years ago
Comments:5 (4 by maintainers)

Top GitHub Comments

1reaction

TezRomacHcommented, Nov 8, 2019

@Yorko try now with 19.11.1 please

0reactions

Yorkocommented, Nov 10, 2019

Looks like it’s fine now, same Kernel, 11th version. Closing.

Top Results From Across the Web

Automatic Mixed Precision package - torch.amp - PyTorch

torch.amp provides convenience methods for mixed precision, ... Creates model and optimizer in default precision model = Net().cuda() optimizer = optim.

How to Optimize Data Transfers in CUDA C/C++

In this and the following post we begin our discussion of code optimization with how to efficiently transfer data between the host and...

RuntimeError: Input type (torch.FloatTensor) and weight type ...

You get this error because your model is on the GPU, but your data is on the CPU. So, you need to send...

Efficient Training on a Single GPU - Hugging Face

from_pretrained("bert-large-uncased").to("cuda") >>> print_gpu_utilization() GPU memory occupied: 2631 MB. We can see that the model weights alone take up 1.3 ...

Trainer — PyTorch Lightning 1.8.5.post0 documentation

This might be useful if you want to collect new metrics from a model right at its ... If your machine has GPUs,...