Semantics of n_fft, window length, and frame length
See original GitHub issueDescription
In STFT, and related methods (eg, iirt
), it is required that win_length <= n_fft
, where n_fft
is treated as the frame length, and win_length
is the length of the window function applied to the center of the frame. This is all fine if we want to over-sample (have more frequency bins than are strictly necessary for the effective span of the window). The API currently does not provide a means to under-sample by specifying a large frame length and smaller number of FFT bins. (Believe it or not, there might be good reasons to want to do this sort of thing.)
What do folks think about relaxing the API, so that n_fft
, win_length
, and frame_length
can each be independently specified? The semantics would be:
frame_length
= the number of samples per frame;win_length
= the number of samples with non-zero window per frame;n_fft
= the number of (output) frequency bins;
and the win_length <= n_fft
requirement would relax to win_length <= frame_length
. By default, frame_length
would be left as None
and inherit from n_fft
, so the semantics of the API would be backward-compatible. But, a user could over-ride the frame length to be larger (or smaller) than the number of frequency bins.
Any thoughts, @dpwe @lostanlen ?
Issue Analytics
- State:
- Created 5 years ago
- Reactions:1
- Comments:6 (6 by maintainers)
Top GitHub Comments
On Sat, Apr 14, 2018 at 9:10 AM, Brian McFee notifications@github.com wrote:
The current args to librosa.core.stft are n_fft (the length of the vector subject to the FFT), hop_length (the sample advance between successive frames), and win_length (the full cycle of the window function).
By making n_hop small relative to n_fft, you get frame oversampling (many successive frames including the same samples). By making win_length much shorter than n_fft, you get spectral oversampling (more frequency points than the theoretical maximum level of detail in the fourier transform).
The actual number of samples “considered” in each transform is the min of n_fft and win_length. We might call this frame_length, although I’m not sure that’s what you were saying. If hop_length > frame_length, we’re completely skipping some samples in the input, which is rarely something you’d want to do. I’m OK with the library allowing the user to do this, although I think it ought to trigger a warning. It’s the trade-off between maximum flexibility and sanity checks.
But I don’t understand a scenario in which you’d want to specify a frame_length other than by min(n_fft, win_length). If you want to oversample the spectrum, use a win_length < n_fft. If you want to only transform the middle of a longer window (yucky because of the discontinuity at the edges, but OK), make win_length > n_fft. In either case, hop_length controls the number of output frames you get, (mostly) independent of win_length and n_fft (excepting the edge effects).
Time-aliasing (= summing up each fft_length chunk of the longer input to make a single fft_length vector) is kind of horrible because it introduces comb filtering at the fft_length period (time-domain cancellation of sinusoidal components that are out of phase when chopped into fft_length pieces). Again, I’m not sure there’s any great value in supporting this; it seems to me that the number of people who actually wanted this behavior would be much smaller than the number of people who accidentally invoked it then got super confused by the results!
DAn.
These are already independent, and it mostly has to do with over-sampling (high n_fft over a shorter window). Since n_fft is currently tied to the frame length, this is the only way to do it.
If n_fft and frame_length become independent, it’s still useful to keep win_length separate for the purposes of padding / frame centering.
You mean frame_length > fft_length? I had time-aliasing in mind. fftpack handles this without any additional effort.