Error when training on multi-GPU
See original GitHub issueWhen I training on multi-GPU a got this error, but when I training on single-GPU that error will not appear.
ValueError: gather got an input of invalid size: got 10x110x29, but expected 10x226x29
python -m multiproc train.py --train-manifest qkids/manifest/qkids_train_manifest_limit_250.csv --val-manifest qkids/manifest/qkids_test_manifest_limit_never_train.csv --cuda --model-path models/libri_final_and_limit.pth --epochs 50 --checkpoint --checkpoint-per-batch 1000 --batch-size 20 [‘train.py’, ‘–train-manifest’, ‘qkids/manifest/qkids_train_manifest_limit_250.csv’, ‘–val-manifest’, ‘qkids/manifest/qkids_test_manifest_limit_never_train.csv’, ‘–cuda’, ‘–model-path’, ‘models/libri_final_and_limit.pth’, ‘–epochs’, ‘50’, ‘–checkpoint’, ‘–checkpoint-per-batch’, ‘1000’, ‘–batch-size’, ‘20’, ‘–world-size’, ‘2’, ‘–rank’, ‘0’, ‘–gpu-rank’, ‘0’] [‘train.py’, ‘–train-manifest’, ‘qkids/manifest/qkids_train_manifest_limit_250.csv’, ‘–val-manifest’, ‘qkids/manifest/qkids_test_manifest_limit_never_train.csv’, ‘–cuda’, ‘–model-path’, ‘models/libri_final_and_limit.pth’, ‘–epochs’, ‘50’, ‘–checkpoint’, ‘–checkpoint-per-batch’, ‘1000’, ‘–batch-size’, ‘20’, ‘–world-size’, ‘2’, ‘–rank’, ‘1’, ‘–gpu-rank’, ‘1’] DistributedDataParallel( (module): DeepSpeech( (conv): MaskConv( (seq_module): Sequential( (0): Conv2d(1, 32, kernel_size=(41, 11), stride=(2, 2), padding=(20, 5)) (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): Hardtanh(min_val=0, max_val=20, inplace) (3): Conv2d(32, 32, kernel_size=(21, 11), stride=(2, 1), padding=(10, 5)) (4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (5): Hardtanh(min_val=0, max_val=20, inplace) ) ) (rnns): Sequential( (0): BatchRNN( (rnn): GRU(1312, 800, bidirectional=True) ) (1): BatchRNN( (batch_norm): SequenceWise ( BatchNorm1d(800, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)) (rnn): GRU(800, 800, bidirectional=True) ) (2): BatchRNN( (batch_norm): SequenceWise ( BatchNorm1d(800, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)) (rnn): GRU(800, 800, bidirectional=True) ) (3): BatchRNN( (batch_norm): SequenceWise ( BatchNorm1d(800, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)) (rnn): GRU(800, 800, bidirectional=True) ) (4): BatchRNN( (batch_norm): SequenceWise ( BatchNorm1d(800, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)) (rnn): GRU(800, 800, bidirectional=True) ) ) (fc): Sequential( (0): SequenceWise ( Sequential( (0): BatchNorm1d(800, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (1): Linear(in_features=800, out_features=29, bias=False) )) ) (inference_softmax): InferenceBatchSoftmax() ) ) Number of parameters: 41187968 /home/luozhiping/workspace/speech/deepspeech.pytorch/model_new.py:98: UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters(). x, h = self.rnn(x) Traceback (most recent call last): File “train.py”, line 248, in <module> out, output_sizes = model(inputs, input_sizes) File “/home/luozhiping/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 491, in call result = self.forward(*input, **kwargs) File “/home/luozhiping/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/distributed.py”, line 217, in forward return self.gather(outputs, self.output_device) File “/home/luozhiping/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/distributed.py”, line 226, in gather return gather(outputs, output_device, dim=self.dim) File “/home/luozhiping/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py”, line 68, in gather return gather_map(outputs) File “/home/luozhiping/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py”, line 63, in gather_map return type(out)(map(gather_map, zip(*outputs))) File “/home/luozhiping/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py”, line 55, in gather_map return Gather.apply(target_device, dim, *outputs) File “/home/luozhiping/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/_functions.py”, line 55, in forward return comm.gather(inputs, ctx.dim, ctx.target_device) File “/home/luozhiping/anaconda3/lib/python3.6/site-packages/torch/cuda/comm.py”, line 186, in gather “but expected {}”.format(got, expected)) ValueError: gather got an input of invalid size: got 10x110x29, but expected 10x226x29 terminate called after throwing an instance of ‘gloo::EnforceNotMet’ what(): [enforce fail at /opt/conda/conda-bld/pytorch_1524586445097/work/third_party/gloo/gloo/cuda.cu:249] error == cudaSuccess. 29 vs 0. Error at: /opt/conda/conda-bld/pytorch_1524586445097/work/third_party/gloo/gloo/cuda.cu:249: driver shutting down
Issue Analytics
- State:
- Created 5 years ago
- Reactions:3
- Comments:15
Top GitHub Comments
If you follow the steps from @slavaGanzin to get past the invalid size error you can get past the assertion error by putting output_lengths on gpu in DeepSpeech:
You will need to ensure the output length returned (into output_sizes) is back on cpu for ctc in the training loop:
These steps allowed me to run on multi-gpu seemingly without issue.
You should use total_length argument:
https://pytorch.org/docs/stable/notes/faq.html#my-recurrent-network-doesn-t-work-with-data-parallelism