Validation Fails with key error
See original GitHub issueIt looks like I am doing something extremely stupid, so please bear with me. I am trying to run simple an4 training example from your README.md. It seems like training during the first epoch goes fine but validation crashes due to KeyError.
I am using pytorch 0.4.0 as torchaudio now has explicit dependency on 0.4.0. Here are my logs:
karora2@dp-gpu4:~/cont_entropy/asr/deepspeech.pytorch$ python train.py --train-manifest data/an4_train_manifest.csv --val-manifest data/an4_train_manifest.csv --cuda
Model Save directory already exists.
DataParallel(
(module): DeepSpeech(
(conv): Sequential(
(0): Conv2d(1, 32, kernel_size=(41, 11), stride=(2, 2), padding=(0, 10))
(1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): Hardtanh(min_val=0, max_val=20, inplace)
(3): Conv2d(32, 32, kernel_size=(21, 11), stride=(2, 1))
(4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): Hardtanh(min_val=0, max_val=20, inplace)
)
(rnns): Sequential(
(0): BatchRNN(
(rnn): GRU(672, 800, bias=False, bidirectional=True)
)
(1): BatchRNN(
(batch_norm): SequenceWise (
BatchNorm1d(800, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True))
(rnn): GRU(800, 800, bias=False, bidirectional=True)
)
(2): BatchRNN(
(batch_norm): SequenceWise (
BatchNorm1d(800, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True))
(rnn): GRU(800, 800, bias=False, bidirectional=True)
)
(3): BatchRNN(
(batch_norm): SequenceWise (
BatchNorm1d(800, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True))
(rnn): GRU(800, 800, bias=False, bidirectional=True)
)
(4): BatchRNN(
(batch_norm): SequenceWise (
BatchNorm1d(800, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True))
(rnn): GRU(800, 800, bias=False, bidirectional=True)
)
)
(fc): Sequential(
(0): SequenceWise (
Sequential(
(0): BatchNorm1d(800, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(1): Linear(in_features=800, out_features=29, bias=False)
))
)
(inference_softmax): InferenceBatchSoftmax()
)
)
Number of parameters: 38067968
/home/ml/karora2/cont_entropy/asr/deepspeech.pytorch/model.py:67: UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters().
x, _ = self.rnn(x)
train.py:304: UserWarning: torch.nn.utils.clip_grad_norm is now deprecated in favor of torch.nn.utils.clip_grad_norm_.
torch.nn.utils.clip_grad_norm(model.parameters(), args.max_norm)
Epoch: [1][1/45] Time 3.612 (3.612) Data 0.239 (0.239) Loss 130.2365 (130.2365)
Epoch: [1][2/45] Time 0.160 (1.886) Data 0.003 (0.121) Loss 121.0078 (125.6221)
Epoch: [1][3/45] Time 0.172 (1.315) Data 0.004 (0.082) Loss 99.7183 (116.9875)
Epoch: [1][4/45] Time 0.181 (1.031) Data 0.003 (0.062) Loss 90.0657 (110.2571)
Epoch: [1][5/45] Time 0.185 (0.862) Data 0.004 (0.051) Loss 73.9919 (103.0040)
Epoch: [1][6/45] Time 0.192 (0.750) Data 0.000 (0.042) Loss 77.1043 (98.6874)
Epoch: [1][7/45] Time 0.193 (0.671) Data 0.002 (0.037) Loss 73.6352 (95.1085)
Epoch: [1][8/45] Time 0.201 (0.612) Data 0.002 (0.032) Loss 62.7453 (91.0631)
Epoch: [1][9/45] Time 0.203 (0.567) Data 0.002 (0.029) Loss 65.0267 (88.1702)
Epoch: [1][10/45] Time 0.216 (0.532) Data 0.002 (0.026) Loss 62.0836 (85.5615)
Epoch: [1][11/45] Time 0.223 (0.504) Data 0.002 (0.024) Loss 52.0819 (82.5179)
Epoch: [1][12/45] Time 0.222 (0.480) Data 0.002 (0.022) Loss 57.3895 (80.4239)
Epoch: [1][13/45] Time 0.230 (0.461) Data 0.002 (0.021) Loss 60.3037 (78.8762)
Epoch: [1][14/45] Time 0.229 (0.444) Data 0.002 (0.019) Loss 76.2436 (78.6881)
Epoch: [1][15/45] Time 0.239 (0.431) Data 0.002 (0.018) Loss 70.9544 (78.1726)
Epoch: [1][16/45] Time 0.238 (0.419) Data 0.002 (0.017) Loss 65.8857 (77.4046)
Epoch: [1][17/45] Time 0.244 (0.408) Data 0.002 (0.016) Loss 66.5676 (76.7672)
Epoch: [1][18/45] Time 0.253 (0.400) Data 0.002 (0.016) Loss 52.6424 (75.4269)
Epoch: [1][19/45] Time 0.252 (0.392) Data 0.003 (0.015) Loss 61.6683 (74.7028)
Epoch: [1][20/45] Time 0.260 (0.385) Data 0.002 (0.014) Loss 61.2333 (74.0293)
Epoch: [1][21/45] Time 0.260 (0.379) Data 0.002 (0.014) Loss 63.9080 (73.5473)
Epoch: [1][22/45] Time 0.264 (0.374) Data 0.002 (0.013) Loss 66.6633 (73.2344)
Epoch: [1][23/45] Time 0.266 (0.369) Data 0.002 (0.013) Loss 63.8530 (72.8265)
Epoch: [1][24/45] Time 0.275 (0.366) Data 0.002 (0.012) Loss 70.0406 (72.7104)
Epoch: [1][25/45] Time 0.279 (0.362) Data 0.003 (0.012) Loss 68.1581 (72.5283)
Epoch: [1][26/45] Time 0.287 (0.359) Data 0.002 (0.011) Loss 67.8396 (72.3480)
Epoch: [1][27/45] Time 0.288 (0.357) Data 0.002 (0.011) Loss 63.9558 (72.0372)
Epoch: [1][28/45] Time 0.294 (0.354) Data 0.002 (0.011) Loss 64.3709 (71.7634)
Epoch: [1][29/45] Time 0.301 (0.352) Data 0.002 (0.011) Loss 73.2265 (71.8139)
Epoch: [1][30/45] Time 0.308 (0.351) Data 0.002 (0.010) Loss 51.9908 (71.1531)
Epoch: [1][31/45] Time 0.309 (0.350) Data 0.002 (0.010) Loss 70.9007 (71.1449)
Epoch: [1][32/45] Time 0.315 (0.349) Data 0.002 (0.010) Loss 73.4481 (71.2169)
Epoch: [1][33/45] Time 0.320 (0.348) Data 0.003 (0.010) Loss 62.2275 (70.9445)
Epoch: [1][34/45] Time 0.331 (0.347) Data 0.002 (0.009) Loss 86.2832 (71.3956)
Epoch: [1][35/45] Time 0.337 (0.347) Data 0.002 (0.009) Loss 73.4765 (71.4551)
Epoch: [1][36/45] Time 0.351 (0.347) Data 0.002 (0.009) Loss 77.1708 (71.6139)
Epoch: [1][37/45] Time 0.360 (0.347) Data 0.002 (0.009) Loss 87.4664 (72.0423)
Epoch: [1][38/45] Time 0.366 (0.348) Data 0.002 (0.009) Loss 79.0521 (72.2268)
Epoch: [1][39/45] Time 0.380 (0.349) Data 0.002 (0.008) Loss 71.7237 (72.2139)
Epoch: [1][40/45] Time 0.386 (0.350) Data 0.002 (0.008) Loss 73.2224 (72.2391)
Epoch: [1][41/45] Time 0.395 (0.351) Data 0.002 (0.008) Loss 68.6490 (72.1515)
Epoch: [1][42/45] Time 0.416 (0.352) Data 0.002 (0.008) Loss 60.9438 (71.8847)
Epoch: [1][43/45] Time 0.445 (0.354) Data 0.002 (0.008) Loss 83.2680 (72.1494)
Epoch: [1][44/45] Time 0.494 (0.358) Data 0.002 (0.008) Loss 64.0922 (71.9663)
Epoch: [1][45/45] Time 0.521 (0.361) Data 0.002 (0.008) Loss 89.0631 (72.2903)
Training Summary Epoch: [1] Time taken (s): 16 Average Loss 72.346
0%| | 0/45 [00:00<?, ?it/s]
train.py:344: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
inputs = Variable(inputs, volatile=True)
Traceback (most recent call last):
File "train.py", line 360, in <module>
decoded_output, _ = decoder.decode(out.data, sizes)
File "/home/ml/karora2/cont_entropy/asr/deepspeech.pytorch/decoder.py", line 196, in decode
remove_repetitions=True, return_offsets=True)
File "/home/ml/karora2/cont_entropy/asr/deepspeech.pytorch/decoder.py", line 156, in convert_to_strings
string, string_offsets = self.process_string(sequences[x], seq_len, remove_repetitions)
File "/home/ml/karora2/cont_entropy/asr/deepspeech.pytorch/decoder.py", line 169, in process_string
char = self.int_to_char[sequence[i]]
KeyError: tensor(0, device='cuda:0')
Any help would be appreciated.
Issue Analytics
- State:
- Created 5 years ago
- Comments:5 (1 by maintainers)
Top Results From Across the Web
Key validation error message - Microsoft Community
I've entered my product key into my Microsoft account and downloaded office into my mac. But when I try to activate them I...
Read more >How to get a key value errors response if validator fails in laravel
I am using Laravel 7 and I am doing an API project. I would like to obtain a key value errors ...
Read more >Mongoose v6.8.0: Validation
When using update validators, required validators only fail when you try to explicitly $unset the key. const kittenSchema = new Schema({ name: {...
Read more >validation set key of array error - Laracasts
validation set key of array error. Hi, when I want to create login with username or email and create custom rule or validation...
Read more >“Key object validation failed“ error when updating a key - Tyk.io
“Key object validation failed“ error when updating a key · Description. Users receive this error message when attempting to make API calls to...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
This is an obvious error. In the dict self.int_to_char, its keys are the type of integer rather than tensors. Therefore, you can change source code in line 169 as:
self.int_to_char[sequence[i].item()]
Should be fine.
Thank you very much