question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Validation Fails with key error

See original GitHub issue

It looks like I am doing something extremely stupid, so please bear with me. I am trying to run simple an4 training example from your README.md. It seems like training during the first epoch goes fine but validation crashes due to KeyError.

I am using pytorch 0.4.0 as torchaudio now has explicit dependency on 0.4.0. Here are my logs:


karora2@dp-gpu4:~/cont_entropy/asr/deepspeech.pytorch$  python train.py --train-manifest data/an4_train_manifest.csv --val-manifest data/an4_train_manifest.csv  --cuda
Model Save directory already exists.
DataParallel(
  (module): DeepSpeech(
    (conv): Sequential(
      (0): Conv2d(1, 32, kernel_size=(41, 11), stride=(2, 2), padding=(0, 10))
      (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): Hardtanh(min_val=0, max_val=20, inplace)
      (3): Conv2d(32, 32, kernel_size=(21, 11), stride=(2, 1))
      (4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): Hardtanh(min_val=0, max_val=20, inplace)
    )
    (rnns): Sequential(
      (0): BatchRNN(
        (rnn): GRU(672, 800, bias=False, bidirectional=True)
      )
      (1): BatchRNN(
        (batch_norm): SequenceWise (
        BatchNorm1d(800, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True))
        (rnn): GRU(800, 800, bias=False, bidirectional=True)
      )
      (2): BatchRNN(
        (batch_norm): SequenceWise (
        BatchNorm1d(800, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True))
        (rnn): GRU(800, 800, bias=False, bidirectional=True)
      )
      (3): BatchRNN(
        (batch_norm): SequenceWise (
        BatchNorm1d(800, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True))
        (rnn): GRU(800, 800, bias=False, bidirectional=True)
      )
      (4): BatchRNN(
        (batch_norm): SequenceWise (
        BatchNorm1d(800, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True))
        (rnn): GRU(800, 800, bias=False, bidirectional=True)
      )
    )
    (fc): Sequential(
      (0): SequenceWise (
      Sequential(
        (0): BatchNorm1d(800, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (1): Linear(in_features=800, out_features=29, bias=False)
      ))
    )
    (inference_softmax): InferenceBatchSoftmax()
  )
)
Number of parameters: 38067968
/home/ml/karora2/cont_entropy/asr/deepspeech.pytorch/model.py:67: UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters().
  x, _ = self.rnn(x)
train.py:304: UserWarning: torch.nn.utils.clip_grad_norm is now deprecated in favor of torch.nn.utils.clip_grad_norm_.
  torch.nn.utils.clip_grad_norm(model.parameters(), args.max_norm)
Epoch: [1][1/45]	Time 3.612 (3.612)	Data 0.239 (0.239)	Loss 130.2365 (130.2365)	
Epoch: [1][2/45]	Time 0.160 (1.886)	Data 0.003 (0.121)	Loss 121.0078 (125.6221)	
Epoch: [1][3/45]	Time 0.172 (1.315)	Data 0.004 (0.082)	Loss 99.7183 (116.9875)	
Epoch: [1][4/45]	Time 0.181 (1.031)	Data 0.003 (0.062)	Loss 90.0657 (110.2571)	
Epoch: [1][5/45]	Time 0.185 (0.862)	Data 0.004 (0.051)	Loss 73.9919 (103.0040)	
Epoch: [1][6/45]	Time 0.192 (0.750)	Data 0.000 (0.042)	Loss 77.1043 (98.6874)	
Epoch: [1][7/45]	Time 0.193 (0.671)	Data 0.002 (0.037)	Loss 73.6352 (95.1085)	
Epoch: [1][8/45]	Time 0.201 (0.612)	Data 0.002 (0.032)	Loss 62.7453 (91.0631)	
Epoch: [1][9/45]	Time 0.203 (0.567)	Data 0.002 (0.029)	Loss 65.0267 (88.1702)	
Epoch: [1][10/45]	Time 0.216 (0.532)	Data 0.002 (0.026)	Loss 62.0836 (85.5615)	
Epoch: [1][11/45]	Time 0.223 (0.504)	Data 0.002 (0.024)	Loss 52.0819 (82.5179)	
Epoch: [1][12/45]	Time 0.222 (0.480)	Data 0.002 (0.022)	Loss 57.3895 (80.4239)	
Epoch: [1][13/45]	Time 0.230 (0.461)	Data 0.002 (0.021)	Loss 60.3037 (78.8762)	
Epoch: [1][14/45]	Time 0.229 (0.444)	Data 0.002 (0.019)	Loss 76.2436 (78.6881)	
Epoch: [1][15/45]	Time 0.239 (0.431)	Data 0.002 (0.018)	Loss 70.9544 (78.1726)	
Epoch: [1][16/45]	Time 0.238 (0.419)	Data 0.002 (0.017)	Loss 65.8857 (77.4046)	
Epoch: [1][17/45]	Time 0.244 (0.408)	Data 0.002 (0.016)	Loss 66.5676 (76.7672)	
Epoch: [1][18/45]	Time 0.253 (0.400)	Data 0.002 (0.016)	Loss 52.6424 (75.4269)	
Epoch: [1][19/45]	Time 0.252 (0.392)	Data 0.003 (0.015)	Loss 61.6683 (74.7028)	
Epoch: [1][20/45]	Time 0.260 (0.385)	Data 0.002 (0.014)	Loss 61.2333 (74.0293)	
Epoch: [1][21/45]	Time 0.260 (0.379)	Data 0.002 (0.014)	Loss 63.9080 (73.5473)	
Epoch: [1][22/45]	Time 0.264 (0.374)	Data 0.002 (0.013)	Loss 66.6633 (73.2344)	
Epoch: [1][23/45]	Time 0.266 (0.369)	Data 0.002 (0.013)	Loss 63.8530 (72.8265)	
Epoch: [1][24/45]	Time 0.275 (0.366)	Data 0.002 (0.012)	Loss 70.0406 (72.7104)	
Epoch: [1][25/45]	Time 0.279 (0.362)	Data 0.003 (0.012)	Loss 68.1581 (72.5283)	
Epoch: [1][26/45]	Time 0.287 (0.359)	Data 0.002 (0.011)	Loss 67.8396 (72.3480)	
Epoch: [1][27/45]	Time 0.288 (0.357)	Data 0.002 (0.011)	Loss 63.9558 (72.0372)	
Epoch: [1][28/45]	Time 0.294 (0.354)	Data 0.002 (0.011)	Loss 64.3709 (71.7634)	
Epoch: [1][29/45]	Time 0.301 (0.352)	Data 0.002 (0.011)	Loss 73.2265 (71.8139)	
Epoch: [1][30/45]	Time 0.308 (0.351)	Data 0.002 (0.010)	Loss 51.9908 (71.1531)	
Epoch: [1][31/45]	Time 0.309 (0.350)	Data 0.002 (0.010)	Loss 70.9007 (71.1449)	
Epoch: [1][32/45]	Time 0.315 (0.349)	Data 0.002 (0.010)	Loss 73.4481 (71.2169)	
Epoch: [1][33/45]	Time 0.320 (0.348)	Data 0.003 (0.010)	Loss 62.2275 (70.9445)	
Epoch: [1][34/45]	Time 0.331 (0.347)	Data 0.002 (0.009)	Loss 86.2832 (71.3956)	
Epoch: [1][35/45]	Time 0.337 (0.347)	Data 0.002 (0.009)	Loss 73.4765 (71.4551)	
Epoch: [1][36/45]	Time 0.351 (0.347)	Data 0.002 (0.009)	Loss 77.1708 (71.6139)	
Epoch: [1][37/45]	Time 0.360 (0.347)	Data 0.002 (0.009)	Loss 87.4664 (72.0423)	
Epoch: [1][38/45]	Time 0.366 (0.348)	Data 0.002 (0.009)	Loss 79.0521 (72.2268)	
Epoch: [1][39/45]	Time 0.380 (0.349)	Data 0.002 (0.008)	Loss 71.7237 (72.2139)	
Epoch: [1][40/45]	Time 0.386 (0.350)	Data 0.002 (0.008)	Loss 73.2224 (72.2391)	
Epoch: [1][41/45]	Time 0.395 (0.351)	Data 0.002 (0.008)	Loss 68.6490 (72.1515)	
Epoch: [1][42/45]	Time 0.416 (0.352)	Data 0.002 (0.008)	Loss 60.9438 (71.8847)	
Epoch: [1][43/45]	Time 0.445 (0.354)	Data 0.002 (0.008)	Loss 83.2680 (72.1494)	
Epoch: [1][44/45]	Time 0.494 (0.358)	Data 0.002 (0.008)	Loss 64.0922 (71.9663)	
Epoch: [1][45/45]	Time 0.521 (0.361)	Data 0.002 (0.008)	Loss 89.0631 (72.2903)	
Training Summary Epoch: [1]	Time taken (s): 16	Average Loss 72.346	
  0%|                                                                                                                                                            | 0/45 [00:00<?, ?it/s]
train.py:344: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  inputs = Variable(inputs, volatile=True)

Traceback (most recent call last):
  File "train.py", line 360, in <module>
    decoded_output, _ = decoder.decode(out.data, sizes)
  File "/home/ml/karora2/cont_entropy/asr/deepspeech.pytorch/decoder.py", line 196, in decode
    remove_repetitions=True, return_offsets=True)
  File "/home/ml/karora2/cont_entropy/asr/deepspeech.pytorch/decoder.py", line 156, in convert_to_strings
    string, string_offsets = self.process_string(sequences[x], seq_len, remove_repetitions)
  File "/home/ml/karora2/cont_entropy/asr/deepspeech.pytorch/decoder.py", line 169, in process_string
    char = self.int_to_char[sequence[i]]
KeyError: tensor(0, device='cuda:0')

Any help would be appreciated.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:5 (1 by maintainers)

github_iconTop GitHub Comments

19reactions
XuesongYangcommented, May 4, 2018

This is an obvious error. In the dict self.int_to_char, its keys are the type of integer rather than tensors. Therefore, you can change source code in line 169 as:

self.int_to_char[sequence[i].item()]

Should be fine.

0reactions
Vanitha-019076803Ecommented, Aug 2, 2021

This is an obvious error. In the dict self.int_to_char, its keys are the type of integer rather than tensors. Therefore, you can change source code in line 169 as:

self.int_to_char[sequence[i].item()]

Should be fine.

Thank you very much

Read more comments on GitHub >

github_iconTop Results From Across the Web

Key validation error message - Microsoft Community
I've entered my product key into my Microsoft account and downloaded office into my mac. But when I try to activate them I...
Read more >
How to get a key value errors response if validator fails in laravel
I am using Laravel 7 and I am doing an API project. I would like to obtain a key value errors ...
Read more >
Mongoose v6.8.0: Validation
When using update validators, required validators only fail when you try to explicitly $unset the key. const kittenSchema = new Schema({ name: {...
Read more >
validation set key of array error - Laracasts
validation set key of array error. Hi, when I want to create login with username or email and create custom rule or validation...
Read more >
“Key object validation failed“ error when updating a key - Tyk.io
“Key object validation failed“ error when updating a key · Description. Users receive this error message when attempting to make API calls to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found