question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Very high WER on test-other for CRDNN librispeech model

See original GitHub issue

Hi, So I was experimenting with the pretrained CRDNN model on the English librispeech data. While the performance is quite good on test-clean set, the performance on the noisy test-other set is very bad.

Looking at WER stats from the pretrained model on gdrive:

test-clean

%WER 3.09 [ 1622 / 52576, 167 ins, 171 del, 1284 sub ]
%SER 33.66 [ 882 / 2620 ]
Scored 2620 sentences, 0 not present in hyp.
================================================================================
ALIGNMENTS

Format:
<utterance-id>, WER DETAILS
<eps> ; reference  ; on ; the ; first ;  line
  I   ;     S      ; =  ;  =  ;   S   ;   D  
 and  ; hypothesis ; on ; the ; third ; <eps>
================================================================================
672-122797-0033, %WER 0.00 [ 0 / 2, 0 ins, 0 del, 0 sub ]
A ; STORY
= ;   =  
A ; STORY
================================================================================
2094-142345-0041, %WER 0.00 [ 0 / 1, 0 ins, 0 del, 0 sub ]
DIRECTION
    =    
DIRECTION
================================================================================
2830-3980-0026, %WER 50.00 [ 1 / 2, 0 ins, 0 del, 1 sub ]
VERSE ; TWO
  S   ;  = 
FIRST ; TWO
================================================================================
237-134500-0025, %WER 50.00 [ 1 / 2, 0 ins, 0 del, 1 sub ]
OH ;  EMIL
=  ;   S  
OH ; AMIEL

(cut for brevity)

test-other

%WER 219.23 [ 114 / 52, 64 ins, 0 del, 50 sub ]
%SER 100.00 [ 16 / 16 ]
Scored 16 sentences, 0 not present in hyp.

2414-128291-0020, %WER 700.00 [ 7 / 1, 6 ins, 0 del, 1 sub ]
WELL ; <eps> ; <eps> ; <eps> ; <eps>  ;  <eps>  ; <eps>
 S   ;   I   ;   I   ;   I   ;   I    ;    I    ;   I  
 I   ; DON'T ;  KNOW ;  WHAT ; YOU'RE ; TALKING ; ABOUT
================================================================================
7902-96592-0020, %WER 700.00 [ 7 / 1, 6 ins, 0 del, 1 sub ]
NONSENSE ; <eps> ; <eps> ; <eps> ; <eps>  ;  <eps>  ; <eps>
   S     ;   I   ;   I   ;   I   ;   I    ;    I    ;   I  
   I     ; DON'T ;  KNOW ;  WHAT ; YOU'RE ; TALKING ; ABOUT
================================================================================
8188-269290-0057, %WER 125.00 [ 5 / 4, 2 ins, 0 del, 3 sub ]
<eps> ; <eps>  ; I ;  WILL ;  TELL  ; HER
  I   ;   I    ; = ;   S   ;   S    ;  S 
 I'M  ; AFRAID ; I ; CAN'T ; AFFORD ;  IT
================================================================================
3538-142836-0023, %WER 600.00 [ 6 / 1, 5 ins, 0 del, 1 sub ]
ICES ; <eps>  ; <eps> ; <eps> ; <eps>  ; <eps>
 S   ;   I    ;   I   ;   I   ;   I    ;   I  
I'M  ; AFRAID ;   I   ; CAN'T ; AFFORD ;   IT 

(cut for brevity)

Why the performance is so bad on test-other set? What I also noticed is that for any other audio sets outside of test-clean performance is also really bad. For example, I’ve tried it for SLR70 and the average WER is also way above 100.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:10

github_iconTop GitHub Comments

1reaction
TParcolletcommented, May 16, 2022

Ok, I can’t today because a problem occurred on our lab cluster ahah

0reactions
KacperKubaracommented, May 17, 2022

Thanks for adding the new model! Just saw that it has been updated (https://drive.google.com/drive/folders/19mAyMR1ITSb83Anhds4n694PLwKD47yf?usp=sharing). It looks good now. I will test it and if there will be some problems I will reopen the issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

speechbrain/asr-crdnn-transformerlm-librispeech
CRDNN with CTC/Attention and RNNLM trained on LibriSpeech ... Release, Test clean WER, Test other WER, GPUs ... Acoustic model (CRDNN + CTC/Attention)....
Read more >
LibriSpeech test-other Benchmark (Speech Recognition)
Rank Model Word Error Rate (WER) Year 1 w2v‑BERT XXL 2.5 2021 3 HuBERT with Libri‑Light 2.9 2021 4 Conv + Transformer + wav2vec2.0 + pseudo...
Read more >
arXiv:2110.08583v1 [eess.AS] 16 Oct 2021
model pretrained without any label can reach a WER as low as 8.2 on Librispeech test-other and only 10h are necessary to go...
Read more >
LibriSpeech ASR corpus - openslr.org
Summary: Large-scale (1000 hours) corpus of read English speech. Category: Speech. License: CC BY 4.0. Downloads (use a mirror closer to you):
Read more >
Robust Speech Recognition via Large-Scale Weak Supervision
the smallest zero-shot Whisper model, which has only 39 million parameters and a 6.7 WER on LibriSpeech test-clean is roughly competitive with the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found