Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

The Pre-trained model from `asr-transformer-transformerlm-librispeech` does not contain all required information

See original GitHub issue

The pre-trained model in https://huggingface.co/speechbrain/asr-transformer-transformerlm-librispeech does not contain the state information for speechbrain.processing.features.InputNormalization.

See https://github.com/speechbrain/speechbrain/blob/8070883dbf33313ab8c143e9725fb3043ba1fdf6/recipes/LibriSpeech/ASR/transformer/hparams/transformer.yaml#L230-L232

The consequence is that during inference time.the.WER.differs.if you change batch size as the mean and stddev are computed using data within a batch.

Issue Analytics

State:
Created 2 years ago
Comments:10

Top GitHub Comments

1reaction

mravanellicommented, Aug 14, 2021

Hi @danpovey , thank you for the comments. The InputNormalization part needs a refactor (e.g, we want to vectorize more). @30stomercury is already working on that.

As for the training modality yes, there is an easy way to fix that.
Thank you for noticing the detaching part. As you mentioned, the code is not actually leaking because we have other safeguards (e.g., x = (x - self.glob_mean.data) / (self.glob_std.data)). However, this is something to fix.
As for the speaker statistics, they aren’t really used in the current recipes and we can even remove them. The idea was to allow users that have speaker information at test time to normalize the mean and std of the features based on the speaker identity (this leads to minor benefits according to my experience).
Registering a buffer could be something to consider as well. It will make the implementation more similar to the batch_norm one (that needs to save running averages).

Thank you again for your help!

0reactions

danpoveycommented, Aug 14, 2021

Also, in InputNormalization, the way it stores speaker statistics, I’m not sure if that’s considered totally normal in terms of testing protocol, if you end up remembering stats from speakers in training who might recur in test time. For the global stats, rather than saving/loading them it would be more normal I think to just register them as buffers, e.g. self.register_buffer('glob_mean', ... ) (but you’d need to know the dimension ahead of time, I think). This will also cause ddp to sync the copy the stats of job 0 to all other jobs (not that this should matter much, for the global stats).