Equal-loudness contour (ISO 226)
See original GitHub issueWhat’s the best way to calculate loudness per frame? And can we make it a function?
TLDR;
Is it this?
S = abs(FFT(y))**2 # power spectrogram
weighting = A_weighting # weighting in dB
weighting = 10**(weighting/10) # weighting for power spectrogram
S *= weighting # perceptually weighted power spectrogram
S = melspectrogram(S) # perceptual pitch distances
loudness = np.mean(S) # taking mean is ok, because not in dB
loudness = logamplitude(loudness) # convert to dB
Longer Explanation
Say we have a mel spectrogram
y = load(mp3)
S = np.abs(FFT(y)) # magnitude spectrogram
S = S**2 # power spectrogram
S = melspectrogram(S) # convert frequencies to mel scale
We can weight the perceived loudness of frequencies differently this way:
dB = logamplitude(S)
perceptual_weighting = A_weighting * logamplitude(S)
But for loudness, we don’t want the mean of dB values:
loudness = mean(perceptual_weighting)
because dB adds up like this
Can we use this equation to add up the dB frequencies to get loudness? It’s more expensive because of extra squaring and division.
OR
Can we weight the power spectrogram instead? Then taking the mean is fine, and we can calculate the dB from there:
perceptual_weighting = A_weighting * power
loudness = logamplitude(mean(perceptual_weighting))
But A_weighting is a logarithmic weighting of dBs, so we need to convert it for power spectrograms
weighting = 10**(A_weighting/10) # is this correct?
And now we can calculate the mean just fine:
loudness = np.mean(power * weighting)
loudness = logamplitude(loudness) # in db
This is great, because we also want to do weighting before we calculate the mel spectrogram, otherwise we lose frequency precision, which sometimes makes the subbass disappear, as in this example:
Issue Analytics
- State:
- Created 7 years ago
- Reactions:2
- Comments:10 (10 by maintainers)
Top GitHub Comments
Is A-weighting good enough for you? Otherwise you can take a look at https://github.com/keunwoochoi/perceptual_weighting, which is a perceptual weighting code based on ISO226. @bmcfee I once was thinking about PR this stuff. What do you think?
Okay, that’s easy enough as well. Since you have a constant factor of
p0**2
in the summation, you can factor it out of the log just like any other reference power.If you’re on the 0.4 series, replace the last line by
Not sure – what would the PR add exactly? Different weighting curves?