Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Normalise Mel filters to constant energy of 1

See original GitHub issue

There is some scaling introduced by the Mel filter bank when calculating the Mel spectrogram. As stated in the documentation setting norm = 1 does give constant energy per filter, but their energy is not 1. In fact, if you sum the filters you don’t get a value of 1, but rather an average of 0.0929 with the following distribution (for n_mel = 128):

This is due to the Mel frequency bandwidth of the triangles used for normalisation, but I’m thinking it might make sense to have the sum of the filter weights equal to 1 when normalised, maybe as an additional option.

Issue Analytics

State:
Created 4 years ago
Comments:16 (13 by maintainers)

Top GitHub Comments

2reactions

bmcfeecommented, Jan 6, 2020

In my opinion, the proposed implementation 2 would be better,

I think I agree with that, and don’t see a need for implementing proposal 1. (Users who really want that behavior can always scale by sr / n_fft post-hoc.)

Where it gets a little tricky is in dealing with FFT normalization modes. By default, numpy’s FFT is unnormalized, though since 1.10 (released in 2016) you can specify norm='ortho' to scale by sqrt(n_fft) – I think at some point it’s worth considering extending stft (and related) methods to expose this functionality, which did not exist when we first designed the API.

However, this raises a bit of a problem for downstream processing like mel. Slaney’s normalization (our current method) cancels out the factor of n_fft, so that mel spectra computed with different frame lengths (but the same sampling rate) should have comparable scale. This is a nice property to have when you want to experiment with things like frequency over-sampling. Proposals 1 and 2 lose this feature, in favor of propagating frame length scaling through to the output. (It typically comes out in the wash of power_to_db anyway, especially when using an adaptive reference point, but it’s nice to account for all variables here.)

However, if we support ortho mode, we lose this property in mel spectra with the current normalization (and proposal 1). Proposal 2 will have consistent behavior though; I would actually prefer a normalization of the form

util.normalize(melfb, norm=norm, axis=1)

where norm can be any supported normalization (l1, l2, inf, None, etc).

and the following names are considered as candidates:

I think it’s helpful here to follow the principle of least surprise. For consistency with the rest of the API, the norm= parameter should conform to the values supported by util.norm(). We can add extra modes, encoded by string values, but we should not change the definition of existing conventions (1, 2, inf, None) or add alternate identifiers for existing conventions; L1 should just be norm=1.

As for how to name the current normalization… maybe 'slaney' is okay after all? Since we haven’t been able to come up with a pithy name that isn’t easily misconstrued, referring to it by attribution rather than description could be a nice compromise.

1reaction

onkyo14tarocommented, Jan 8, 2020

@bmcfee Thanks for giving many suggestions!

However, this raises a bit of a problem for downstream processing like mel. Slaney’s normalization (our current method) cancels out the factor of n_fft, so that mel spectra computed with different frame lengths (but the same sampling rate) should have comparable scale. This is a nice property to have when you want to experiment with things like frequency over-sampling. Proposals 1 and 2 lose this feature, in favor of propagating frame length scaling through to the output. (It typically comes out in the wash of power_to_db anyway, especially when using an adaptive reference point, but it’s nice to account for all variables here.) However, if we support ortho mode, we lose this property in mel spectra with the current normalization (and proposal 1).

This perspective was new to me.

I think it’s helpful here to follow the principle of least surprise. For consistency with the rest of the API, the norm= parameter should conform to the values supported by util.norm(). We can add extra modes, encoded by string values, but we should not change the definition of existing conventions (1, 2, inf, None) or add alternate identifiers for existing conventions; L1 should just be norm=1.

This made me understand that following the APIs of other existing functions (like normalize) can reduce user confusion.

I think your proposed norm’s option is best.

Top Results From Across the Web

Per-Channel Energy Normalization: Why and How

Tuning auditory filters to the perceptual mel scale provides a time- frequency representation, named mel-frequency spectrogram, in which the frequency ...

Mel Frequency Cepstral Coefficient (MFCC) tutorial

This is performed by our Mel filterbank: the first filter is very narrow and gives an indication of how much energy exists near...

librosa.feature.melspectrogram

Compute a mel-scaled spectrogram. ... e.g., 1 for energy, 2 for power, etc. **kwargsadditional keyword arguments for Mel filter bank parameters ...

Per-Channel Energy Normalization: Why and How - NSF PAR

auditory filters to the perceptual mel scale provides a time- ... Time (s). (b) Per-channel energy normalization (PCEN). Fig. 1.

spafe.features.mfcc

Default is 1. pre_emph_coeff (float) – apply pre-emphasis filter [1 ... Compute MFCC features (Mel-frequency cepstral coefficients) from an audio signal.