`compute_loudness` issues
See original GitHub issueHello,
I was looking at the loudness curves that spectral_ops.compute_loudness produces and found that they don’t describe the perceived loudness very well. An example is shown below:

The loudness has a little peak somewhere where there is no audible sound, and from the curve you cannot see the four distinct notes as clearly as I would expect.
When looking into the code, I think I found the issue
s = stft_fn(audio, frame_size=n_fft, overlap=overlap, pad_end=True)
# Compute power.
amplitude = lib.abs(s)
power_db = amplitude_to_db(amplitude, use_tf=use_tf)
# Perceptual weighting.
frequencies = librosa.fft_frequencies(sr=sample_rate, n_fft=n_fft)
a_weighting = librosa.A_weighting(frequencies)[lib.newaxis, lib.newaxis, :]
loudness = power_db + a_weighting
# Set dynamic range.
loudness -= ref_db
loudness = lib.maximum(loudness, -range_db)
mean = tf.reduce_mean if use_tf else np.mean
# Average over frequency bins.
loudness = mean(loudness, axis=-1)
loudness is a decibel value per frequency bin before taking the mean, but it does not make sense to take the mean over decibel values I believe. Instead, the a-weighting should be applied to the power spectrum (not in dB), then it should be summed over the frequency bins and then converted to dB scale.
In the way it is currently implemented, the A-weighting changes nothing besides adding its mean value to every time step. I plotted the result of compute_loudness with and without A-weighting, but in the without case I added its mean value, and this is the result (on another sample):

So I tried to look into how librosa computes loudness but found that they don’t offer this function because it seems to be so complex, but in this discussion there is some helpful code.
Another unrelated small issue is, that the stft is computed with padding, which makes the loudness go down at the end of every sample a lot.
Let me know if you agree that these things are issues, and if I should work on a PR for it.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:9 (1 by maintainers)

Top Related StackOverflow Question
I implemented the fix as proposed by @PluieElectrique in pollinations/ddsp. A demo of the new method can be found in this colab.
I added a
use_buggy_loudnessparameter that is by default set to true if the ddsp version is below or equal to'1.6.2'.I agree with what’s been said so far. The way the current code adds the A-weighting and then averages over decibels means that the loudness values are only shifted a little. Furthermore, take a look at this frequency sweep from 0 Hz to 8 kHz:
This is a linear spectrogram in Audacity with an FFT size of 2048, Hann windows, and zero padding 1. Notice the ringing around the curve (this is known as spectral leakage?). Now, look at the loudness features for this sweep:
The blue line is using the NumPy loudness code. The orange line is the same code, but with
replaced by
so that instead of taking the mean over decibels, we convert to power, take the mean, and then convert back to decibels. This gets rid of the artifacts and produces a smooth curve, which seems to better match the smooth rise in loudness of the sweep.
I suppose there’s a chance that the fluctuations of the blue line help the model by giving it extra information. But it seems unlikely, because DDSP is supposed to extract loudness in the same way as “Fast and Flexible Neural Audio Synthesis”, and it says in the FFNAS appendix that: (emphasis mine)