Dithering constant
See original GitHub issueWhy do torchaudio.compliance.kaldi.fbank
and torchaudio.compliance.kaldi.spectrogram
have so large dither
default parameter (=1.0)? It very often just noises full output.
It’s common to use dither around 0, e.g 0.00001 in QuartzNet, Jasper – near to SOTA ASR models (https://github.com/NVIDIA/NeMo/blob/master/examples/asr/configs/quartznet15x5.yaml).
I want to notice that even in torchaudio tutorial we have dither = 0.0: https://pytorch.org/tutorials/beginner/audio_preprocessing_tutorial.html.
Also look at this issue and how it was resolved: https://github.com/pytorch/audio/issues/157
Issue Analytics
- State:
- Created 4 years ago
- Comments:9 (9 by maintainers)
Top Results From Across the Web
Dither - Wikipedia
Dither is an intentionally applied form of noise used to randomize quantization error, preventing large-scale patterns such as color banding in images.
Read more >Dithering part one – simple quantization | Bart Wronski
Dithering can be defined as intentional / deliberate adding of some noise to signal to ... Dithering quantization of a constant signal.
Read more >What is dithering in audio? When to dither and how it works
Dithering helps keep digital audio sounding great, even when some data ... low) level of constant noise, it turns out that its character...
Read more >What is Dithering: The Ultimate Guide for Beginners - eMastered
Dither will sound like some variation of white noise (a soft, consistent, hiss). Can you hear a difference between 16-bit audio and 24-bit...
Read more >Dithering Explained: What it is, When to Use It, and Why it's ...
Also, for MP3 don't worry about the dither. After export of the final 16bit master WAV file just make sure to keep the...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Kaldi uses
1
as the default dither value. It is fine for Kaldi because waveform in kaldi has a range [-32768, 32767].1
is relatively small compared to the maximum value 32767.However, in torchaudio,
returns a tensor with values in the range [-1, 1]. So if you still use the default value
1
from Kaldi, you will distort the audio signal.Second this, the default right now makes the whole
torchaudio.compliace.kaldi
features totally unusable out-of-the-box. I spent one hour looking at possible bugs on labels only to find out that basically my model was fed noise because of the dither default value.