Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`compute_loudness` issues

See original GitHub issue

Hello,

I was looking at the loudness curves that spectral_ops.compute_loudness produces and found that they don’t describe the perceived loudness very well. An example is shown below:

The loudness has a little peak somewhere where there is no audible sound, and from the curve you cannot see the four distinct notes as clearly as I would expect.

When looking into the code, I think I found the issue

  s = stft_fn(audio, frame_size=n_fft, overlap=overlap, pad_end=True)

  # Compute power.
  amplitude = lib.abs(s)
  power_db = amplitude_to_db(amplitude, use_tf=use_tf)

  # Perceptual weighting.
  frequencies = librosa.fft_frequencies(sr=sample_rate, n_fft=n_fft)
  a_weighting = librosa.A_weighting(frequencies)[lib.newaxis, lib.newaxis, :]
  loudness = power_db + a_weighting

  # Set dynamic range.
  loudness -= ref_db
  loudness = lib.maximum(loudness, -range_db)
  mean = tf.reduce_mean if use_tf else np.mean

  # Average over frequency bins.
  loudness = mean(loudness, axis=-1)

loudness is a decibel value per frequency bin before taking the mean, but it does not make sense to take the mean over decibel values I believe. Instead, the a-weighting should be applied to the power spectrum (not in dB), then it should be summed over the frequency bins and then converted to dB scale.

In the way it is currently implemented, the A-weighting changes nothing besides adding its mean value to every time step. I plotted the result of compute_loudness with and without A-weighting, but in the without case I added its mean value, and this is the result (on another sample):

So I tried to look into how librosa computes loudness but found that they don’t offer this function because it seems to be so complex, but in this discussion there is some helpful code.

Another unrelated small issue is, that the stft is computed with padding, which makes the loudness go down at the end of every sample a lot.

Let me know if you agree that these things are issues, and if I should work on a PR for it.

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:9 (1 by maintainers)

Top GitHub Comments

3reactions

nielsrolfcommented, Jul 22, 2021

I implemented the fix as proposed by @PluieElectrique in pollinations/ddsp. A demo of the new method can be found in this colab.

I added a use_buggy_loudness parameter that is by default set to true if the ddsp version is below or equal to '1.6.2'.

3reactions

PluieElectriquecommented, Jul 18, 2021

I agree with what’s been said so far. The way the current code adds the A-weighting and then averages over decibels means that the loudness values are only shifted a little. Furthermore, take a look at this frequency sweep from 0 Hz to 8 kHz:

sweep-audacity

This is a linear spectrogram in Audacity with an FFT size of 2048, Hann windows, and zero padding 1. Notice the ringing around the curve (this is known as spectral leakage?). Now, look at the loudness features for this sweep:

sweep-ddsp

The blue line is using the NumPy loudness code. The orange line is the same code, but with

loudness = np.mean(loudness, axis=-1)

replaced by

loudness = np.mean(np.power(10, loudness / 10.0), axis=-1)
loudness = 10.0 * np.log10(np.maximum(1e-20, loudness))

so that instead of taking the mean over decibels, we convert to power, take the mean, and then convert back to decibels. This gets rid of the artifacts and produces a smooth curve, which seems to better match the smooth rise in loudness of the sweep.

I suppose there’s a chance that the fluctuations of the blue line help the model by giving it extra information. But it seems unlikely, because DDSP is supposed to extract loudness in the same way as “Fast and Flexible Neural Audio Synthesis”, and it says in the FFNAS appendix that: (emphasis mine)

To calculate loudness, we use librosa’s perceptual_weighting() function on the square of the STFT. This produces a spectrum in dB, which we convert back into a linear scale and compute a mean over frequency bins. This value is then scaled via log compression with a small offset eps = 1e − 5 to prevent overflow.

Top Results From Across the Web

compute loudness error · Issue #56 · magenta/ddsp - GitHub

Hey, I am trying to compute loudness from an only 4 seconds audio. And it turns out that the loudness calculation just failed....

Feedback on SOUND FORGE Pro 16 | Page 2 - MAGIX.info

Loudness Meter V2 uses layout information to correctly compute loudness for multichannel ... The Preset Untitled and MP3 Mono issues are still not...

How to compute loudness from audio signal? - Stack Overflow

I have an audio signal and I want to detect loud moments from it. The problem I have is that I am not...

Find errors in this code. But please be fast. - CodeProject

If you are having problems getting started at all, then this may help: How to Write Code to Solve a Problem, A Beginner's...

INET: casting error in layered apsk transmission

The problem is that whenever one of the radios transmits, the following ... in the following code fragment in the ApskLayeredReciver::computeNoise():.