Question about LayerNorm in Wav2vec
See original GitHub issueHi @TParcollet ,
I have a question about the layer normalization used in wav2vec for both HF and Fairseq implementations. Why using it on all (batch_size, sequence_length, hidden_size)
dimensions of the wav2vec last_hidden_state
output instead of just on the hidden_size
dimension, as showed in the examples of pytorch doc here (maybe better call it InstanceNorm
).
Thanks in advance !
Issue Analytics
- State:
- Created a year ago
- Comments:9 (4 by maintainers)
Top Results From Across the Web
speechbrain - Question about LayerNorm in Wav2vec
I have a question about the layer normalization used in wav2vec for both HF and Fairseq implementations. Why using it on all (batch_size,...
Read more >How should we normalize the audio signal for wav2vec 2.0 ...
First, I noticed that only the large configuration includes the normalisation: True YAML parameter. Hence, my first question is: why not both?
Read more >arXiv:2010.12829v4 [cs.CL] 2 Jan 2021
Our key finding is that a minimalistic LNA. (LayerNorm and Attention) finetuning can achieve zero-shot crosslingual and cross- modality transfer ...
Read more >How does Wav2Vec 2.0 feed output from Convolutional ...
Therefore this implies that the output from the Convolutional Feature Encoder will have varying lengths across batches. However, the Transformer ...
Read more >Wav2Vec 2.0: Learning Speech Representations via Self ...
Image from original paper by authors. The raw speech is passed through a feature encoder (temporal CNN blocks + layer norm + GeLU ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @TParcollet ,
I think I’ve made some mistakes here.
Normalizing each wav is ok, but it should not include the padded values. I’ve forgot the
wav
is a padded batch… It should better be done somewhere when making dataloader or passwav_lens
to model and do it after.Layer normalization has already been done in HF and fairseq (with trained
w
andb
). Should do it one more time here ?