[Bug] RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
See original GitHub issueDescribe the bug
I’m trying to run Tacotron2 training, but receives RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
To Reproduce
CUDA_VISIBLE_DEVICES="0" python3 train_tacotron_ddc.py
Expected behavior
No response
Logs
admin@8f7837b57ed6:~/TTS$ CUDA_VISIBLE_DEVICES="0" python3 train_tacotron_ddc.py
> Setting up Audio Processor...
| > sample_rate:22050
| > resample:False
| > num_mels:80
| > log_func:np.log
| > min_level_db:-100
| > frame_shift_ms:None
| > frame_length_ms:None
| > ref_level_db:20
| > fft_size:1024
| > power:1.5
| > preemphasis:0.0
| > griffin_lim_iters:60
| > signal_norm:False
| > symmetric_norm:True
| > mel_fmin:0
| > mel_fmax:8000
| > pitch_fmin:0.0
| > pitch_fmax:640.0
| > spec_gain:1.0
| > stft_pad_mode:reflect
| > max_norm:4.0
| > clip_norm:True
| > do_trim_silence:True
| > trim_db:60.0
| > do_sound_norm:False
| > do_amp_to_db_linear:True
| > do_amp_to_db_mel:True
| > do_rms_norm:False
| > db_level:None
| > stats_path:None
| > base:2.718281828459045
| > hop_length:256
| > win_length:1024
> Setting up Audio Processor...
| > sample_rate:22050
| > resample:False
| > num_mels:80
| > log_func:np.log
| > min_level_db:-100
| > frame_shift_ms:None
| > frame_length_ms:None
| > ref_level_db:20
| > fft_size:1024
| > power:1.5
| > preemphasis:0.0
| > griffin_lim_iters:60
| > signal_norm:False
| > symmetric_norm:True
| > mel_fmin:0
| > mel_fmax:8000
| > pitch_fmin:0.0
| > pitch_fmax:640.0
| > spec_gain:1.0
| > stft_pad_mode:reflect
| > max_norm:4.0
| > clip_norm:True
| > do_trim_silence:True
| > trim_db:60.0
| > do_sound_norm:False
| > do_amp_to_db_linear:True
| > do_amp_to_db_mel:True
| > do_rms_norm:False
| > db_level:None
| > stats_path:None
| > base:2.718281828459045
| > hop_length:256
| > win_length:1024
| > Found 9039 files in /home/admin/M-AI-Labs/resampled_to_22050/by_book/male/minaev/oblomov
> Using CUDA: True
> Number of GPUs: 1
> Model has 47669492 parameters
> Number of output frames: 6
> EPOCH: 0/1000
--> /home/admin/TTS/run-August-02-2022_11+05AM-903a77c1
> DataLoader initialization
| > Tokenizer:
| > add_blank: False
| > use_eos_bos: False
| > use_phonemes: True
| > phonemizer:
| > phoneme language: ru-ru
| > phoneme backend: gruut
| > Number of instances : 8949
| > Preprocessing samples
| > Max text length: 216
| > Min text length: 3
| > Avg text length: 99.18292546653258
|
| > Max audio length: 583682.0
| > Min audio length: 26014.0
| > Avg audio length: 182216.04805006145
| > Num. instances discarded samples: 0
| > Batch group size: 0.
> TRAINING (2022-08-02 11:05:38)
/home/admin/TTS/TTS/tts/models/tacotron2.py:331: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
alignment_lengths = (
! Run is removed from /home/admin/TTS/run-August-02-2022_11+05AM-903a77c1
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/trainer/trainer.py", line 1534, in fit
self._fit()
File "/usr/local/lib/python3.8/dist-packages/trainer/trainer.py", line 1518, in _fit
self.train_epoch()
File "/usr/local/lib/python3.8/dist-packages/trainer/trainer.py", line 1283, in train_epoch
_, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
File "/usr/local/lib/python3.8/dist-packages/trainer/trainer.py", line 1115, in train_step
outputs, loss_dict_new, step_time = self._optimize(
File "/usr/local/lib/python3.8/dist-packages/trainer/trainer.py", line 999, in _optimize
outputs, loss_dict = self._model_train_step(batch, model, criterion)
File "/usr/local/lib/python3.8/dist-packages/trainer/trainer.py", line 955, in _model_train_step
return model.train_step(*input_args)
File "/home/admin/TTS/TTS/tts/models/tacotron2.py", line 339, in train_step
loss_dict = criterion(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/admin/TTS/TTS/tts/layers/losses.py", line 440, in forward
self.criterion_st(stopnet_output, stopnet_target, stop_target_length)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/admin/TTS/TTS/tts/layers/losses.py", line 193, in forward
loss = functional.binary_cross_entropy_with_logits(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py", line 3150, in binary_cross_entropy_with_logits
return torch.binary_cross_entropy_with_logits(input, target, weight, pos_weight, reduction_enum)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
Environment
{
"CUDA": {
"GPU": [
"NVIDIA GeForce RTX 2080 Ti",
"NVIDIA GeForce RTX 2080 Ti"
],
"available": true,
"version": "10.2"
},
"Packages": {
"PyTorch_debug": false,
"PyTorch_version": "1.12.0+cu102",
"TTS": "0.7.1",
"numpy": "1.21.6"
},
"System": {
"OS": "Linux",
"architecture": [
"64bit",
"ELF"
],
"processor": "x86_64",
"python": "3.8.10",
"version": "#36~20.04.1-Ubuntu SMP Fri Aug 27 08:06:32 UTC 2021"
}
}
Additional context
No response
Issue Analytics
- State:
- Created a year ago
- Comments:10 (5 by maintainers)
Top Results From Across the Web
RuntimeError: Expected all tensors to be on the same device ...
RuntimeError : Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! when resuming training...
Read more >Expected all tensors to be on the same ... - PyTorch Forums
I am implementing A3C algorithm in pytorch and getting above error. I am constructing a shared model and shared optimizer which are ...
Read more >RuntimeError: Expected all tensors to be ... - Deep Graph Library
Hi! I am encountering problems when trying to send my graph to device for prediction. I do the following: device = torch.device("cuda:0" if ......
Read more >Expected all tensors to be on the same device, but found at ...
[Bug] RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument...
Read more >RuntimeError: Expected all tensors to ... - Hugging Face Forums
I am getting the following error when I try to evaluate a ... to be on the same device, but found at least...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Fixed by #1872
For precise analysis of the error, append the following code, you can see the tensors’ respective devices. In the
TTS/TTS/tts/layers/losses.py
file,Then you can just add
tensor.cuda()
to change a tensor’s device to GPU. You can do this directly without doing the above step as well.