question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Explain difference to torch.stft

See original GitHub issue

You mentioned in the readme that

Other GPU audio processing tools are torchaudio and tf.signal. But they are not using the neural network approach, and hence the Fourier basis can not be trained.

Can you explain this in more detail, please?

  • when would I benefit from the STFT in nnAudio compared to let’s say torch.stft?

  • does it make a difference which STFT I use when I am interested in a time domain loss, hence does it change backprop?

Thanks!

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:14 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
jjhuang-cacommented, Aug 24, 2020

@carlthome The speech processing community long ago decided that win_length=400 is a good choice for time resolution, so it’s basically standardized for speech recognition. You’re right that FFT needs power of 2, and 512 is the closest to 400. The Hann window is applied to the 400 input samples, zero pad to 512, then FFT. If you look at any speech processing pipelines this is how it’s done (in Kaldi, ESPNet, etc.).

1reaction
KinWaiCheukcommented, Nov 26, 2019

I am curious though about trying nnAudio and will !

Thanks for the interest in nnAudio. nnAudio is still in the early stage of development, if you find any bugs and problems, free feel to ask here again, I will try my best to solve the problems and improve nnAudio.

torch.stft is a really good option, since no extra dependency is required for it unlike torchaudio. What makes nnAudio different from torch.stft is the trainable stft.

Read more comments on GitHub >

github_iconTop Results From Across the Web

torch.stft — PyTorch 1.13 documentation
The STFT computes the Fourier transform of short overlapping windows of the input. This giving frequency components of the signal as they change...
Read more >
Implementing STFT with Pytorch gives a slightly different result ...
The difference is from the difference between their default bit. NumPy's float is 64bit by default. PyTorch's float is 32bit by default.
Read more >
PyTorch STFT generates weird vertical bars - Reddit
I have finally found the source of the difference. torch.stft defaults to a rectangular window (no window), librosa and torchaudio default ...
Read more >
Signal Processing Theory and Practice with PyTorch - Kaggle
import torch from torch.fft import fft # The FFT results are complex ... in short, they are made from STFT as well and...
Read more >
normalize STFT output by magnitude
The output of STFT (torch real tensor S) has the last dimension containing real and imaginary part. Is pow(2).sum(-1) again some normalization ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found