Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Spectrogram calculation issue with power=1

See original GitHub issue

🐛 Bug

https://github.com/pytorch/audio/blob/42a705d51eeea34242ec54e902a90728e0d10e72/torchaudio/functional.py#L209

While this is true when power=2, it is summing up real part and imaginary part when power=1.

Issue Analytics

State:
Created 4 years ago
Comments:8 (6 by maintainers)

Top GitHub Comments

1reaction

vincentqbcommented, Aug 22, 2019

I see, so librosa does spec_f = spec_f.power(2).sum(-1).sqrt().pow(power) or, using the new syntax, spec_f = complex_norm(spec_f, power=power). We should also check how Kaldi is doing it to make sure we wouldn’t break the compliance interface.

1reaction

vincentqbcommented, Aug 22, 2019

So, if power is specified, you’d return the power of the magnitude (aka absolute value) of the complex number via spec_f = spec_f.pow(2).sum(-1).sqrt().pow(power) ?

The code currently behaves very closely to L^p norms: (sum_i |x_i|^p)^{1/p} even for non-integers p >= 0. The difference with the code is the absence of absolute value and the p-th root. I would expect this to have been the original intention.

For the case p=2, we recover the standard magnitude of a complex number, as you mentioned.
For p=1, we get |x_0| + |x_1| as mentioned above – not what the code is doing
For p=0, we get len((x,y)) – always 2 for complex numbers – what the code is doing
For p=infinity, we get max(|x_i|) – not currently supported by the code.

Mixing normalization by the L^2 norm (when using normalized) with other power is a little strange.

I would expect power to be use to to normalize by the p-th norm, when specified (and remove the normalized option if it does not interfere with other places).
If we want the magnitude, or other p-th norm, I would simply have a separate function to compute them, which the user can explicitly invoke.

Thoughts?