Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Speech Recognition Seems To Overfit

See original GitHub issue

Hi, I don’t know if this is an issue with the framework, but did not know where else to ask. I have been training the speech recognition example (speech.yml) for about 80 epochs on a Titan X within a tensorflow-gpu-py3 based docker image. For some reason, the training loss has gone way down, but my validation loss is still very high and the sample predictions it spits out are gibberish.

Example output:

Epoch 80/inf, loss=5.845: 100%|##########| 2432/2432 [07:04<00:00,  7.80samples/s]
Validating, loss=766.122:  94%|#########4| 256/271 [00:21<00:01, 13.70samples/s]
Prediction: "thg asi bw a hnta e tb incotnetk rndegibnrtrlan ty bmna ekftett trelaob"
Truth: "and what inquired missus macpherson has mary ann given you her love"

Is this sort of behavior expected this early in training?
How long would you expect to have to train this model to start getting reasonable results.

Issue Analytics

State:
Created 7 years ago
Comments:6

Top GitHub Comments

1reaction

scottstephensoncommented, Jan 23, 2017

@bharris47 We just put up some bigger data sets. Can you point your train url and checksum to one of these?

URLS:
10 hour(0.9GB):  http://kur.deepgram.com/data/lsc100-10p-train.tar.gz
20 hour(1.8GB):  http://kur.deepgram.com/data/lsc100-20p-train.tar.gz
50 hour (4.5GB):  http://kur.deepgram.com/data/lsc100-50p-train.tar.gz
100 hour (8.9GB): http://kur.deepgram.com/data/lsc100-100p-train.tar.gz

CHECKSUMS:
10 hour:  46354f284588fec8facd2fc6abee6ba9d020e21bdcb26081d3990e96e720d8a6
20 hour:  e8075b10d3750e6532d10cfb2c00aa8047518b1046708fdcab14e4d1f56c499d
50 hour:  14a740c0ea1097b60e3d0df924f2e0175009a997cb9c9b55e27b28fe280afdc0
100 hour: cad3d2aa735d50d4ddb051fd8455f2dd7625ba0bb1c7dd1528da171a10f4fe86

The 10 hour set is double the size of the default dataset for the stock speech example. You can keep going up in scale and by the 50 hour mark, you’re bound to start seeing pretty good output.

Note: The above datasets are fractions of the 100 hour dataset (100%=100p, 50%=50p, …) from librispeech. They should not be concatenated because: 100p contains 50p which contains 20p which contains 10p.

0reactions

ajsypcommented, Jan 23, 2017

Glad to hear! Feel free to use our data format as a template for adding even more data! It’s just a simple tarball.

Top Results From Across the Web

Do people “cheat” by overfitting test data - Ehud Reiter's Blog

From a scientific perspective, the key thing here is to make the scope clear in our hypothesis and claims. Eg, explicitly say that...

Improving sequence-to-sequence speech recognition training ...

One solution to the overfitting problem is increasing the amount of available training data and the variety exhibited by the training data with ......

What is Overfitting? - IBM

When the model memorizes the noise and fits too closely to the training set, the model becomes “overfitted,” and it is unable to...

Overfitting Mechanism and Avoidance in Deep Neural Networks

By separating samples into correctly and incorrectly classified ones, we show that they behave very differently, where the loss decreases in the correct...

Continuous speech recognition with ESP32 for numbers (0--9)

Overfitting - as its only my voice? I am using the default 1D model, that is pretty simple and my not be able...