question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug] RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

See original GitHub issue

Describe the bug

I’m trying to run Tacotron2 training, but receives RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

To Reproduce

CUDA_VISIBLE_DEVICES="0" python3 train_tacotron_ddc.py

Expected behavior

No response

Logs

admin@8f7837b57ed6:~/TTS$ CUDA_VISIBLE_DEVICES="0" python3 train_tacotron_ddc.py
 > Setting up Audio Processor...
 | > sample_rate:22050
 | > resample:False
 | > num_mels:80
 | > log_func:np.log
 | > min_level_db:-100
 | > frame_shift_ms:None
 | > frame_length_ms:None
 | > ref_level_db:20
 | > fft_size:1024
 | > power:1.5
 | > preemphasis:0.0
 | > griffin_lim_iters:60
 | > signal_norm:False
 | > symmetric_norm:True
 | > mel_fmin:0
 | > mel_fmax:8000
 | > pitch_fmin:0.0
 | > pitch_fmax:640.0
 | > spec_gain:1.0
 | > stft_pad_mode:reflect
 | > max_norm:4.0
 | > clip_norm:True
 | > do_trim_silence:True
 | > trim_db:60.0
 | > do_sound_norm:False
 | > do_amp_to_db_linear:True
 | > do_amp_to_db_mel:True
 | > do_rms_norm:False
 | > db_level:None
 | > stats_path:None
 | > base:2.718281828459045
 | > hop_length:256
 | > win_length:1024
 > Setting up Audio Processor...
 | > sample_rate:22050
 | > resample:False
 | > num_mels:80
 | > log_func:np.log
 | > min_level_db:-100
 | > frame_shift_ms:None
 | > frame_length_ms:None
 | > ref_level_db:20
 | > fft_size:1024
 | > power:1.5
 | > preemphasis:0.0
 | > griffin_lim_iters:60
 | > signal_norm:False
 | > symmetric_norm:True
 | > mel_fmin:0
 | > mel_fmax:8000
 | > pitch_fmin:0.0
 | > pitch_fmax:640.0
 | > spec_gain:1.0
 | > stft_pad_mode:reflect
 | > max_norm:4.0
 | > clip_norm:True
 | > do_trim_silence:True
 | > trim_db:60.0
 | > do_sound_norm:False
 | > do_amp_to_db_linear:True
 | > do_amp_to_db_mel:True
 | > do_rms_norm:False
 | > db_level:None
 | > stats_path:None
 | > base:2.718281828459045
 | > hop_length:256
 | > win_length:1024
 | > Found 9039 files in /home/admin/M-AI-Labs/resampled_to_22050/by_book/male/minaev/oblomov
 > Using CUDA: True
 > Number of GPUs: 1

 > Model has 47669492 parameters

 > Number of output frames: 6

 > EPOCH: 0/1000
 --> /home/admin/TTS/run-August-02-2022_11+05AM-903a77c1


> DataLoader initialization
| > Tokenizer:
        | > add_blank: False
        | > use_eos_bos: False
        | > use_phonemes: True
        | > phonemizer:
                | > phoneme language: ru-ru
                | > phoneme backend: gruut
| > Number of instances : 8949
 | > Preprocessing samples
 | > Max text length: 216
 | > Min text length: 3
 | > Avg text length: 99.18292546653258
 |
 | > Max audio length: 583682.0
 | > Min audio length: 26014.0
 | > Avg audio length: 182216.04805006145
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.

 > TRAINING (2022-08-02 11:05:38)
/home/admin/TTS/TTS/tts/models/tacotron2.py:331: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  alignment_lengths = (
 ! Run is removed from /home/admin/TTS/run-August-02-2022_11+05AM-903a77c1
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/trainer/trainer.py", line 1534, in fit
    self._fit()
  File "/usr/local/lib/python3.8/dist-packages/trainer/trainer.py", line 1518, in _fit
    self.train_epoch()
  File "/usr/local/lib/python3.8/dist-packages/trainer/trainer.py", line 1283, in train_epoch
    _, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
  File "/usr/local/lib/python3.8/dist-packages/trainer/trainer.py", line 1115, in train_step
    outputs, loss_dict_new, step_time = self._optimize(
  File "/usr/local/lib/python3.8/dist-packages/trainer/trainer.py", line 999, in _optimize
    outputs, loss_dict = self._model_train_step(batch, model, criterion)
  File "/usr/local/lib/python3.8/dist-packages/trainer/trainer.py", line 955, in _model_train_step
    return model.train_step(*input_args)
  File "/home/admin/TTS/TTS/tts/models/tacotron2.py", line 339, in train_step
    loss_dict = criterion(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/admin/TTS/TTS/tts/layers/losses.py", line 440, in forward
    self.criterion_st(stopnet_output, stopnet_target, stop_target_length)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/admin/TTS/TTS/tts/layers/losses.py", line 193, in forward
    loss = functional.binary_cross_entropy_with_logits(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py", line 3150, in binary_cross_entropy_with_logits
    return torch.binary_cross_entropy_with_logits(input, target, weight, pos_weight, reduction_enum)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Environment

{
    "CUDA": {
        "GPU": [
            "NVIDIA GeForce RTX 2080 Ti",
            "NVIDIA GeForce RTX 2080 Ti"
        ],
        "available": true,
        "version": "10.2"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "1.12.0+cu102",
        "TTS": "0.7.1",
        "numpy": "1.21.6"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            "ELF"
        ],
        "processor": "x86_64",
        "python": "3.8.10",
        "version": "#36~20.04.1-Ubuntu SMP Fri Aug 27 08:06:32 UTC 2021"
    }
}

Additional context

No response

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:10 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
erogolcommented, Aug 15, 2022

Fixed by #1872

1reaction
p0p4kcommented, Aug 6, 2022

For precise analysis of the error, append the following code, you can see the tensors’ respective devices. In the TTS/TTS/tts/layers/losses.py file,

#add immediately above L193
tensors_to_check = [x.masked_select(mask), target.masked_select(mask), self.pos_weight] 
for t in tensors_to_check:
    try:
        print(f'tensor {t} is on GPU device - {t.get_device()}
    except:
        print(f'tensor {t} is on cpu'}
#add immediately above L197
tensors_to_check = [x, target, pos_weight=self.pos_weight] 
for t in tensors_to_check:
    try:
        print(f'tensor {t} is on GPU device - {t.get_device()}
    except:
        print(f'tensor {t} is on cpu'}

Then you can just add tensor.cuda() to change a tensor’s device to GPU. You can do this directly without doing the above step as well.

Read more comments on GitHub >

github_iconTop Results From Across the Web

RuntimeError: Expected all tensors to be on the same device ...
RuntimeError : Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! when resuming training...
Read more >
Expected all tensors to be on the same ... - PyTorch Forums
I am implementing A3C algorithm in pytorch and getting above error. I am constructing a shared model and shared optimizer which are ...
Read more >
RuntimeError: Expected all tensors to be ... - Deep Graph Library
Hi! I am encountering problems when trying to send my graph to device for prediction. I do the following: device = torch.device("cuda:0" if ......
Read more >
Expected all tensors to be on the same device, but found at ...
[Bug] RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument...
Read more >
RuntimeError: Expected all tensors to ... - Hugging Face Forums
I am getting the following error when I try to evaluate a ... to be on the same device, but found at least...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found