Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Question about LayerNorm in Wav2vec

See original GitHub issue

Hi @TParcollet ,

I have a question about the layer normalization used in wav2vec for both HF and Fairseq implementations. Why using it on all (batch_size, sequence_length, hidden_size) dimensions of the wav2vec last_hidden_state output instead of just on the hidden_size dimension, as showed in the examples of pytorch doc here (maybe better call it InstanceNorm).

Thanks in advance !

https://github.com/speechbrain/speechbrain/blob/424e7921531b0ea6523557ef0fd6ca249936bd26/speechbrain/lobes/models/huggingface_wav2vec.py#L280-L285

https://github.com/speechbrain/speechbrain/blob/424e7921531b0ea6523557ef0fd6ca249936bd26/speechbrain/lobes/models/fairseq_wav2vec.py#L172-L176

Issue Analytics

State:
Created a year ago
Comments:9 (4 by maintainers)

Top GitHub Comments

1reaction

TParcolletcommented, Sep 26, 2022

It can be done in the dataloader. But I already noted that we should add our padded-compatible layernorm on these recipes. Honestly, I wouldn’t expect this to impact to much the results … It’s adding regularization.

0reactions

bofenghuangcommented, Sep 20, 2022

Hi @TParcollet ,

I think I’ve made some mistakes here.

Normalizing each wav is ok, but it should not include the padded values. I’ve forgot the wav is a padded batch… It should better be done somewhere when making dataloader or pass wav_lens to model and do it after.
Layer normalization has already been done in HF and fairseq (with trained w and b). Should do it one more time here ?