Normalise Mel filters to constant energy of 1
See original GitHub issueThere is some scaling introduced by the Mel filter bank when calculating the Mel spectrogram. As stated in the documentation setting norm = 1
does give constant energy per filter, but their energy is not 1. In fact, if you sum the filters you don’t get a value of 1, but rather an average of 0.0929 with the following distribution (for n_mel = 128
):
This is due to the Mel frequency bandwidth of the triangles used for normalisation, but I’m thinking it might make sense to have the sum of the filter weights equal to 1 when normalised, maybe as an additional option.
Issue Analytics
- State:
- Created 4 years ago
- Comments:16 (13 by maintainers)
Top Results From Across the Web
Per-Channel Energy Normalization: Why and How
Tuning auditory filters to the perceptual mel scale provides a time- frequency representation, named mel-frequency spectrogram, in which the frequency ...
Read more >Mel Frequency Cepstral Coefficient (MFCC) tutorial
This is performed by our Mel filterbank: the first filter is very narrow and gives an indication of how much energy exists near...
Read more >librosa.feature.melspectrogram
Compute a mel-scaled spectrogram. ... e.g., 1 for energy, 2 for power, etc. **kwargsadditional keyword arguments for Mel filter bank parameters ...
Read more >Per-Channel Energy Normalization: Why and How - NSF PAR
auditory filters to the perceptual mel scale provides a time- ... Time (s). (b) Per-channel energy normalization (PCEN). Fig. 1.
Read more >spafe.features.mfcc
Default is 1. pre_emph_coeff (float) – apply pre-emphasis filter [1 ... Compute MFCC features (Mel-frequency cepstral coefficients) from an audio signal.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I think I agree with that, and don’t see a need for implementing proposal 1. (Users who really want that behavior can always scale by
sr / n_fft
post-hoc.)Where it gets a little tricky is in dealing with FFT normalization modes. By default, numpy’s FFT is unnormalized, though since 1.10 (released in 2016) you can specify
norm='ortho'
to scale bysqrt(n_fft)
– I think at some point it’s worth considering extendingstft
(and related) methods to expose this functionality, which did not exist when we first designed the API.However, this raises a bit of a problem for downstream processing like
mel
. Slaney’s normalization (our current method) cancels out the factor ofn_fft
, so that mel spectra computed with different frame lengths (but the same sampling rate) should have comparable scale. This is a nice property to have when you want to experiment with things like frequency over-sampling. Proposals 1 and 2 lose this feature, in favor of propagating frame length scaling through to the output. (It typically comes out in the wash ofpower_to_db
anyway, especially when using an adaptive reference point, but it’s nice to account for all variables here.)However, if we support
ortho
mode, we lose this property in mel spectra with the current normalization (and proposal 1). Proposal 2 will have consistent behavior though; I would actually prefer a normalization of the formwhere
norm
can be any supported normalization (l1, l2, inf, None, etc).I think it’s helpful here to follow the principle of least surprise. For consistency with the rest of the API, the
norm=
parameter should conform to the values supported byutil.norm()
. We can add extra modes, encoded by string values, but we should not change the definition of existing conventions (1, 2, inf, None) or add alternate identifiers for existing conventions; L1 should just benorm=1
.As for how to name the current normalization… maybe
'slaney'
is okay after all? Since we haven’t been able to come up with a pithy name that isn’t easily misconstrued, referring to it by attribution rather than description could be a nice compromise.@bmcfee Thanks for giving many suggestions!
This perspective was new to me.
This made me understand that following the APIs of other existing functions (like
normalize
) can reduce user confusion.I think your proposed
norm
’s option is best.