question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Semantics of n_fft, window length, and frame length

See original GitHub issue

Description

In STFT, and related methods (eg, iirt), it is required that win_length <= n_fft, where n_fft is treated as the frame length, and win_length is the length of the window function applied to the center of the frame. This is all fine if we want to over-sample (have more frequency bins than are strictly necessary for the effective span of the window). The API currently does not provide a means to under-sample by specifying a large frame length and smaller number of FFT bins. (Believe it or not, there might be good reasons to want to do this sort of thing.)

What do folks think about relaxing the API, so that n_fft, win_length, and frame_length can each be independently specified? The semantics would be:

  • frame_length = the number of samples per frame;
  • win_length = the number of samples with non-zero window per frame;
  • n_fft = the number of (output) frequency bins;

and the win_length <= n_fft requirement would relax to win_length <= frame_length. By default, frame_length would be left as None and inherit from n_fft, so the semantics of the API would be backward-compatible. But, a user could over-ride the frame length to be larger (or smaller) than the number of frequency bins.

Any thoughts, @dpwe @lostanlen ?

Issue Analytics

  • State:open
  • Created 5 years ago
  • Reactions:1
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

2reactions
dpwecommented, Apr 14, 2018

On Sat, Apr 14, 2018 at 9:10 AM, Brian McFee notifications@github.com wrote:

What’s the point of win_length independent of frame_length? If the frame samples outside of the win are zero’d out anyway? Unless we also allow win_length > frame_length, which I can at least imagine.

These are already independent, and it mostly has to do with over-sampling (high n_fft over a shorter window). Since n_fft is currently tied to the frame length, this is the only way to do it.

The current args to librosa.core.stft are n_fft (the length of the vector subject to the FFT), hop_length (the sample advance between successive frames), and win_length (the full cycle of the window function).

By making n_hop small relative to n_fft, you get frame oversampling (many successive frames including the same samples). By making win_length much shorter than n_fft, you get spectral oversampling (more frequency points than the theoretical maximum level of detail in the fourier transform).

The actual number of samples “considered” in each transform is the min of n_fft and win_length. We might call this frame_length, although I’m not sure that’s what you were saying. If hop_length > frame_length, we’re completely skipping some samples in the input, which is rarely something you’d want to do. I’m OK with the library allowing the user to do this, although I think it ought to trigger a warning. It’s the trade-off between maximum flexibility and sanity checks.

But I don’t understand a scenario in which you’d want to specify a frame_length other than by min(n_fft, win_length). If you want to oversample the spectrum, use a win_length < n_fft. If you want to only transform the middle of a longer window (yucky because of the discontinuity at the edges, but OK), make win_length > n_fft. In either case, hop_length controls the number of output frames you get, (mostly) independent of win_length and n_fft (excepting the edge effects).

If n_fft and frame_length become independent, it’s still useful to keep win_length separate for the purposes of padding / frame centering.

What happens when win_length > fft_length? We transform only the middle fft_length points? Or we time-alias?

You mean frame_length > fft_length? I had time-aliasing in mind. fftpack handles this without any additional effort.

Time-aliasing (= summing up each fft_length chunk of the longer input to make a single fft_length vector) is kind of horrible because it introduces comb filtering at the fft_length period (time-domain cancellation of sinusoidal components that are out of phase when chopped into fft_length pieces). Again, I’m not sure there’s any great value in supporting this; it seems to me that the number of people who actually wanted this behavior would be much smaller than the number of people who accidentally invoked it then got super confused by the results!

DAn.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/librosa/librosa/issues/695#issuecomment-381328172, or mute the thread https://github.com/notifications/unsubscribe-auth/AAhs0cDk2uTYG-xwTY9bvLf7j35ZDjlHks5tofVQgaJpZM4TUXrp .

1reaction
bmcfeecommented, Apr 14, 2018

What’s the point of win_length independent of frame_length? If the frame samples outside of the win are zero’d out anyway? Unless we also allow win_length > frame_length, which I can at least imagine.

These are already independent, and it mostly has to do with over-sampling (high n_fft over a shorter window). Since n_fft is currently tied to the frame length, this is the only way to do it.

If n_fft and frame_length become independent, it’s still useful to keep win_length separate for the purposes of padding / frame centering.

What happens when win_length > fft_length? We transform only the middle fft_length points? Or we time-alias?

You mean frame_length > fft_length? I had time-aliasing in mind. fftpack handles this without any additional effort.

Read more comments on GitHub >

github_iconTop Results From Across the Web

What is the relation between windowing and hopping in audio ...
So xm[n] is a finite-length frame of audio of length L and represents the audio in the neighborhood of x[n+mH] or m hops...
Read more >
Window Size - Signal Analysis Introduction - Ircam
The window size represents a number of samples, and a duration. It is the main parameter of the analysis. The window size depends...
Read more >
dsp.STFT - Short-time FFT - MathWorks
STFT( window , overlap , nfft ) returns a short-time FFT object with the ... Set the window length equal to the input...
Read more >
Does torchaudio.transforms.spectrogram work correctly if n_fft ...
I can't work out why n_fft would effect the number of total frames if frames are based on signal length, window size, and...
Read more >
The Speech Signal: Non-Linear Coding
We can pad zeros to the end of a signal to make it a desired length ... -frate 100 Frame rate; -wlen 0.025625...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found