question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Question about LayerNorm in Wav2vec

See original GitHub issue

Hi @TParcollet ,

I have a question about the layer normalization used in wav2vec for both HF and Fairseq implementations. Why using it on all (batch_size, sequence_length, hidden_size) dimensions of the wav2vec last_hidden_state output instead of just on the hidden_size dimension, as showed in the examples of pytorch doc here (maybe better call it InstanceNorm).

Thanks in advance !

https://github.com/speechbrain/speechbrain/blob/424e7921531b0ea6523557ef0fd6ca249936bd26/speechbrain/lobes/models/huggingface_wav2vec.py#L280-L285

https://github.com/speechbrain/speechbrain/blob/424e7921531b0ea6523557ef0fd6ca249936bd26/speechbrain/lobes/models/fairseq_wav2vec.py#L172-L176

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:9 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
TParcolletcommented, Sep 26, 2022
  1. It can be done in the dataloader. But I already noted that we should add our padded-compatible layernorm on these recipes. Honestly, I wouldn’t expect this to impact to much the results … It’s adding regularization.
0reactions
bofenghuangcommented, Sep 20, 2022

Hi @TParcollet ,

I think I’ve made some mistakes here.

  1. Normalizing each wav is ok, but it should not include the padded values. I’ve forgot the wav is a padded batch… It should better be done somewhere when making dataloader or pass wav_lens to model and do it after.

  2. Layer normalization has already been done in HF and fairseq (with trained w and b). Should do it one more time here ?

Read more comments on GitHub >

github_iconTop Results From Across the Web

speechbrain - Question about LayerNorm in Wav2vec
I have a question about the layer normalization used in wav2vec for both HF and Fairseq implementations. Why using it on all (batch_size,...
Read more >
How should we normalize the audio signal for wav2vec 2.0 ...
First, I noticed that only the large configuration includes the normalisation: True YAML parameter. Hence, my first question is: why not both?
Read more >
arXiv:2010.12829v4 [cs.CL] 2 Jan 2021
Our key finding is that a minimalistic LNA. (LayerNorm and Attention) finetuning can achieve zero-shot crosslingual and cross- modality transfer ...
Read more >
How does Wav2Vec 2.0 feed output from Convolutional ...
Therefore this implies that the output from the Convolutional Feature Encoder will have varying lengths across batches. However, the Transformer ...
Read more >
Wav2Vec 2.0: Learning Speech Representations via Self ...
Image from original paper by authors. The raw speech is passed through a feature encoder (temporal CNN blocks + layer norm + GeLU ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found