Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Very high WER on test-other for CRDNN librispeech model

See original GitHub issue

Hi, So I was experimenting with the pretrained CRDNN model on the English librispeech data. While the performance is quite good on test-clean set, the performance on the noisy test-other set is very bad.

Looking at WER stats from the pretrained model on gdrive:

test-clean

%WER 3.09 [ 1622 / 52576, 167 ins, 171 del, 1284 sub ]
%SER 33.66 [ 882 / 2620 ]
Scored 2620 sentences, 0 not present in hyp.
================================================================================
ALIGNMENTS

Format:
<utterance-id>, WER DETAILS
<eps> ; reference  ; on ; the ; first ;  line
  I   ;     S      ; =  ;  =  ;   S   ;   D  
 and  ; hypothesis ; on ; the ; third ; <eps>
================================================================================
672-122797-0033, %WER 0.00 [ 0 / 2, 0 ins, 0 del, 0 sub ]
A ; STORY
= ;   =  
A ; STORY
================================================================================
2094-142345-0041, %WER 0.00 [ 0 / 1, 0 ins, 0 del, 0 sub ]
DIRECTION
    =    
DIRECTION
================================================================================
2830-3980-0026, %WER 50.00 [ 1 / 2, 0 ins, 0 del, 1 sub ]
VERSE ; TWO
  S   ;  = 
FIRST ; TWO
================================================================================
237-134500-0025, %WER 50.00 [ 1 / 2, 0 ins, 0 del, 1 sub ]
OH ;  EMIL
=  ;   S  
OH ; AMIEL

(cut for brevity)

test-other

%WER 219.23 [ 114 / 52, 64 ins, 0 del, 50 sub ]
%SER 100.00 [ 16 / 16 ]
Scored 16 sentences, 0 not present in hyp.

2414-128291-0020, %WER 700.00 [ 7 / 1, 6 ins, 0 del, 1 sub ]
WELL ; <eps> ; <eps> ; <eps> ; <eps>  ;  <eps>  ; <eps>
 S   ;   I   ;   I   ;   I   ;   I    ;    I    ;   I  
 I   ; DON'T ;  KNOW ;  WHAT ; YOU'RE ; TALKING ; ABOUT
================================================================================
7902-96592-0020, %WER 700.00 [ 7 / 1, 6 ins, 0 del, 1 sub ]
NONSENSE ; <eps> ; <eps> ; <eps> ; <eps>  ;  <eps>  ; <eps>
   S     ;   I   ;   I   ;   I   ;   I    ;    I    ;   I  
   I     ; DON'T ;  KNOW ;  WHAT ; YOU'RE ; TALKING ; ABOUT
================================================================================
8188-269290-0057, %WER 125.00 [ 5 / 4, 2 ins, 0 del, 3 sub ]
<eps> ; <eps>  ; I ;  WILL ;  TELL  ; HER
  I   ;   I    ; = ;   S   ;   S    ;  S 
 I'M  ; AFRAID ; I ; CAN'T ; AFFORD ;  IT
================================================================================
3538-142836-0023, %WER 600.00 [ 6 / 1, 5 ins, 0 del, 1 sub ]
ICES ; <eps>  ; <eps> ; <eps> ; <eps>  ; <eps>
 S   ;   I    ;   I   ;   I   ;   I    ;   I  
I'M  ; AFRAID ;   I   ; CAN'T ; AFFORD ;   IT 

(cut for brevity)

Why the performance is so bad on test-other set? What I also noticed is that for any other audio sets outside of test-clean performance is also really bad. For example, I’ve tried it for SLR70 and the average WER is also way above 100.

Issue Analytics

State:
Created a year ago
Comments:10

Top GitHub Comments

1reaction

TParcolletcommented, May 16, 2022

Ok, I can’t today because a problem occurred on our lab cluster ahah

0reactions

KacperKubaracommented, May 17, 2022

Thanks for adding the new model! Just saw that it has been updated (https://drive.google.com/drive/folders/19mAyMR1ITSb83Anhds4n694PLwKD47yf?usp=sharing). It looks good now. I will test it and if there will be some problems I will reopen the issue.

Top Results From Across the Web

speechbrain/asr-crdnn-transformerlm-librispeech

CRDNN with CTC/Attention and RNNLM trained on LibriSpeech ... Release, Test clean WER, Test other WER, GPUs ... Acoustic model (CRDNN + CTC/Attention)....

LibriSpeech test-other Benchmark (Speech Recognition)

Rank Model Word Error Rate (WER) Year 1 w2v‑BERT XXL 2.5 2021 3 HuBERT with Libri‑Light 2.9 2021 4 Conv + Transformer + wav2vec2.0 + pseudo...

arXiv:2110.08583v1 [eess.AS] 16 Oct 2021

model pretrained without any label can reach a WER as low as 8.2 on Librispeech test-other and only 10h are necessary to go...

LibriSpeech ASR corpus - openslr.org

Summary: Large-scale (1000 hours) corpus of read English speech. Category: Speech. License: CC BY 4.0. Downloads (use a mirror closer to you):

Robust Speech Recognition via Large-Scale Weak Supervision

the smallest zero-shot Whisper model, which has only 39 million parameters and a 6.7 WER on LibriSpeech test-clean is roughly competitive with the...