question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Dithering constant

See original GitHub issue

Why do torchaudio.compliance.kaldi.fbank and torchaudio.compliance.kaldi.spectrogram have so large dither default parameter (=1.0)? It very often just noises full output.

It’s common to use dither around 0, e.g 0.00001 in QuartzNet, Jasper – near to SOTA ASR models (https://github.com/NVIDIA/NeMo/blob/master/examples/asr/configs/quartznet15x5.yaml).

I want to notice that even in torchaudio tutorial we have dither = 0.0: https://pytorch.org/tutorials/beginner/audio_preprocessing_tutorial.html.

Also look at this issue and how it was resolved: https://github.com/pytorch/audio/issues/157

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:9 (9 by maintainers)

github_iconTop GitHub Comments

6reactions
csukuangfjcommented, May 8, 2020

Why do torchaudio.compliance.kaldi.fbank and torchaudio.compliance.kaldi.spectrogram have so large dither default parameter (=1.0)

Kaldi uses 1 as the default dither value. It is fine for Kaldi because waveform in kaldi has a range [-32768, 32767]. 1 is relatively small compared to the maximum value 32767.

However, in torchaudio,

torchaudio.load(filename)

returns a tensor with values in the range [-1, 1]. So if you still use the default value 1 from Kaldi, you will distort the audio signal.

1reaction
popcornellcommented, Feb 29, 2020

Second this, the default right now makes the whole torchaudio.compliace.kaldi features totally unusable out-of-the-box. I spent one hour looking at possible bugs on labels only to find out that basically my model was fed noise because of the dither default value.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Dither - Wikipedia
Dither is an intentionally applied form of noise used to randomize quantization error, preventing large-scale patterns such as color banding in images.
Read more >
Dithering part one – simple quantization | Bart Wronski
Dithering can be defined as intentional / deliberate adding of some noise to signal to ... Dithering quantization of a constant signal.
Read more >
What is dithering in audio? When to dither and how it works
Dithering helps keep digital audio sounding great, even when some data ... low) level of constant noise, it turns out that its character...
Read more >
What is Dithering: The Ultimate Guide for Beginners - eMastered
Dither will sound like some variation of white noise (a soft, consistent, hiss). Can you hear a difference between 16-bit audio and 24-bit...
Read more >
Dithering Explained: What it is, When to Use It, and Why it's ...
Also, for MP3 don't worry about the dither. After export of the final 16bit master WAV file just make sure to keep the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found