Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Bad performance on real audio

See original GitHub issue

Hi @leo19941227

I was able to get the code run and do some exploration on ASR. However, the performance is too bad. Now I’m trying to figure out what could be the problem. I think one could be the configuaration I used. I really appreciate it if you could have a look at this and share with me any suggestion/hint.

python run_downstream.py -m=inference  -c='./downstream/asr/config.yaml' -d=asr -t='mani.flac' -p='result/wav2vec2_hug_base_960_final' -n='result/downstream/asr_wav2vec2_hug_base_960' -i='result/downstream/wav2vec2_hug_base_960_final/dev-clean-best.ckpt' -e='result/downstream/wav2vec2_hug_base_960_final/dev-clean-best.ckpt' -u='result/downstream/wav2vec2_hug_base_960' -s=hidden_states

And this is the text:

No, the cheaper option would be great though.

No problem. How about a flight that leaves from Seattle to Paris on May 15 at 3 PM. Your next flight will be on May 19 at 10 PM to London.

And this is the transcription from model:

NO CHIEF WOULD BE GREAT 

NO PRO HOW BUT A FLIGHT THAT LEAVES FROM SALO TO PARIS ON MAY FIFTEEN THREE YOUR NEXT FLIGHT WILL BE ON MAY NINETEEN AT TEN P TO LONDON

Thanks for your help~

Issue Analytics

State:
Created a year ago
Comments:11 (1 by maintainers)

Top GitHub Comments

2reactions

Mecoli1219commented, Nov 12, 2022

Any hint or idea on this?

Hello, @benam2 ! Recently I have done the similar experiment and found the same issue as yours. I thought the problem might be the default config.yml of asr didn’t use the Language model in decoding, so the generated sequence will have some wierd words. BTW, I have infered the model on my real voice (which is a poor English speaker), so the effect of not using LM is more obvious.

NO THAT SHEEP HER UPSUN WILL BE QUATE E DOGH
NO PROBIN HOWBOT OF FLIGHT TLAT LEAVES FROM SITOES TO PARIS ON MAIDE FIFTEEN AT THREE P M YOU R NEST FRIGHT WILL BE A MAY NIGHTING AT TEMP IM TO LONDON

I’m not sure is your problem same as mine. You can try to set the decoder_type in config.yaml to "kenlm" and try the experiment again. I hope that this is going to help you.

1reaction

Mecoli1219commented, Nov 18, 2022

@benam2 well, then it may not be the LM problem. I have come up with three ideas:

The quality of the sound.(Noise, amplitude, …)
Could the non-native speaker be the main reason? Refer to Accent modification for speech recognition of non-native speakers using neural style transfer
You can try the task on different upstream model. (Some said that Hubert is better?)