Can't train with fp16 on Nvidia P100
See original GitHub issuetraining with fp16 doesn’t work for me on a P100, I’ll look into fixing it, but for future reference here is the full stacktrace torch version 1.9.0
2021-06-29 10:29:09.537741: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:481: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 4, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
cpuset_checked))
/usr/local/lib/python3.7/dist-packages/torch/functional.py:472: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at /pytorch/aten/src/ATen/native/SpectralOps.cpp:664.)
normalized, onesided, return_complex)
Traceback (most recent call last):
File "train_ms.py", line 294, in <module>
main()
File "train_ms.py", line 50, in main
mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,))
File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/content/vits/train_ms.py", line 118, in run
train_and_evaluate(rank, epoch, hps, [net_g, net_d], [optim_g, optim_d], [scheduler_g, scheduler_d], scaler, [train_loader, eval_loader], logger, [writer, writer_eval])
File "/content/vits/train_ms.py", line 192, in train_and_evaluate
scaler.scale(loss_gen_all).backward()
File "/usr/local/lib/python3.7/dist-packages/torch/_tensor.py", line 255, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/usr/local/lib/python3.7/dist-packages/torch/autograd/__init__.py", line 149, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: "fill_cuda" not implemented for 'ComplexHalf'
Issue Analytics
- State:
- Created 2 years ago
- Comments:8
Top Results From Across the Web
Train With Mixed Precision - NVIDIA Documentation Center
Mixed precision is the combined use of different numerical precisions in a computational method. Half precision (also known as FP16) data ...
Read more >Problem when training with fp16 - Support - OpenNMT Forum
Hi, I meet a problem when training with option -fp16, My model is trained with Tesla P100, so the train speed will be...
Read more >FP16 in Pytorch - Medium
Currently, you can't train your entire model in FP16 because some equations don't support it, but it still speeds up the process quite...
Read more >Mixed Precision Training - Paperspace Blog
FP16 training accuracy suffers when gradient values are too tiny to be ... it can be shown that Tesla V100's Tensor Cores outperform...
Read more >Hardware for Deep Learning. Part 3: GPU - Intento
In addition to making possible to train and store larger models, switching to FP16 typically gives 2x speed improvement (2x more TFLOPS). FP16...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I think a better way to solve this problem is to wrap the torch.stft with
autocast(enabled=off)
inside the mel_spectrogram_torch function. Here is the code:I created a pull request. It has been working for me with 3090 GPUs and torch 1.9