Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

czech data preparation for ASR - ffmpeg with pipe cause get_utt2dur.sh crash

See original GitHub issue

The data preparation for Czech language uses ffmepg together with pipe in wav.scp which does not support wave length in the header.

The wav.scp command is created here: https://github.com/espnet/espnet/blob/a5742a3b23d8a27c0c0ef02d105e1ab9d6321e08/egs/commonvoice/asr1/local/data_prep.pl#L57

The ffmpeg problematic issue is described at: https://trac.ffmpeg.org/ticket/7892

The error in the output of run.sh - the script level:

# script level for run.sh
utils/data/get_utt2dur.sh: could not get utterance lengths from sphere-file headers, using wav-to-duration
run.pl: 4 / 4 failed, log is in data/train_cs/log/get_durations.*.log

The error demonstration when using and not using the pipe with FFmpeg. See the duration info!

The test_bad.wav was redirected to the file (same behavior as used by pipe).

The test_ok.wav was saved to disc by FFmpeg directly. All other FFmpeg parameters are the same.

oplatek@hydra4:master-replicate-czech:asr1$ ffmpeg -i download/cs_data/cv-corpus-5.1-2020-06-22/cs/clips/common_voice_cs_20500128.mp3 -ar 16000 -acodec pcm_s16le -ac 1 -f wav - > test_bad.wav 2> /dev/null                           
oplatek@hydra4:master-replicate-czech:asr1$ ffmpeg -i download/cs_data/cv-corpus-5.1-2020-06-22/cs/clips/common_voice_cs_20500128.mp3 -ar 16000 -acodec pcm_s16le -ac 1 -f wav test_ok.wav 2> /dev/null
oplatek@hydra4:master-replicate-czech:asr1$ soxi test_bad.wav test_ok.wav                                          

Input File     : 'test_bad.wav'
Channels       : 1
Sample Rate    : 16000
Precision      : 16-bit
Duration       : 37:16:57.73 = 2147483647 samples ~ 1.00663e+07 CDDA sectors                                       
File Size      : 96.8k
Bit Rate       : 5.77
Sample Encoding: 16-bit Signed Integer PCM


Input File     : 'test_ok.wav'
Channels       : 1                                                                                                 
Sample Rate    : 16000
Precision      : 16-bit                                                                                            
Duration       : 00:00:03.02 = 48384 samples ~ 226.8 CDDA sectors                                                  
File Size      : 96.8k
Bit Rate       : 256k                                                                                              
Sample Encoding: 16-bit Signed Integer PCM

Total Duration of 2 files: 37:17:00.75

Workaround: save the files to wave first.

Any ideas how to make it work without saving the wave files first?

Issue Analytics

State:
Created 3 years ago
Comments:7

Top GitHub Comments

1reaction

oplatekcommented, Mar 4, 2021

FYI: I have managed to run successfully first two stages with the workaround from https://github.com/kaldi-asr/kaldi/pull/4467/files.

Feel free to close this issue. I hit problems with LM training with stage 3 and I moved to the Espnet2 recipe in egs2/commonvoice/asr1 as you suggested - thank you! I hit another issue which I reported here: https://github.com/espnet/espnet/issues/3042

1reaction

kamo-naoyukicommented, Mar 3, 2021

FYI: I used this simple workaround https://github.com/kaldi-asr/kaldi/pull/4467/files and set --read-entire-file to true when I run get_utt2dur.sh on the data before speed perturbation.

Good, this is better for us, thank you!

Btw: I think that FFmpeg could potentially solve it by reading mp3 twice (I assume some CLI option would be needed e.g. --compute-length) or use the information from a header - unfortunately MP3 header does not contain duration information.

Potentially yes, maybe, ffmpeg doesn’t prepare such options, I’m not sure. Another idea is creating a reading tool to do it. This is not difficult.