do_normalize set to True by default for WAV2VEC tokenizer
See original GitHub issueEnvironment info
transformers
version: 4.6.1- Platform: macOS-11.2.3-x86_64-i386-64bit
- Python version: 3.8.2
- PyTorch version (GPU?): 1.8.1 (False)
- Tensorflow version (GPU?): 2.4.1 (False)
- Using GPU in script?: No
- Using distributed or parallel set-up in script?: No
Who can help
Information
Model I am using (Bert, XLNet …): Wav2Vec
The problem arises when using:
- [*] the official example scripts: (give details below)
- my own modified scripts: (give details below)
The tasks I am working on is:
- an official GLUE/SQUaD task: (give the name)
- my own task or dataset: (give details below)
To reproduce
Steps to reproduce the behavior:
wav_input_16khz, samplerate = sf.read(AUDIOFILE) tokenizer = Wav2Vec2Tokenizer.from_pretrained(“facebook/wav2vec2-base-960h”) tokenizer_2 = Wav2Vec2Tokenizer.from_pretrained(“facebook/wav2vec2-base-960h”,do_normalize=False) features = tokenizer(wav_input_16khz, return_tensors=“pt”).input_values features_2 = tokenizer_2(wav_input_16khz, return_tensors=“pt”).input_values features == features_2
Out[1]: tensor([[False, False, False, …, False, False, False]])
Expected behavior
As written in the documentation "do_normalize (:obj:bool
, optional
, defaults to :obj:False
):
Whether or not to zero-mean unit-variance normalize the input. Normalizing can help to significantly
improve the performance for some models, e.g., wav2vec2-lv60 <https://huggingface.co/models?search=lv60>
__." should be set to False.
However, the option seems to be set to True by default during the initialization.
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
sure I will do it when I have free time 😃
Oh yeah you’re right @Lhemamou ! Would you maybe like to open a PR to fix the documentation ? It should state that it defaults to
True
in this case