Time series
See original GitHub issueI’d like to reflect upon what functions would be desired (that could live within torchaudio or outside) in order to offer preliminary support for time series.
- Time series data format (e.g. many channels compared to audio)
- Missing data imputing
- Interaction with calendar information
- Conversion to other formats, say to audio waveform
- Option on transformations to respect time direction
- Streaming use case?
Could we make sure that our constructs are general enough to touch on time series, without sacrificing the primary goal of audio for this library?
Motivation
An audio (multichannel) waveform is a (vector) time series with constant time step whose length is given by sample_rate
.
Audio processing and time series analysis are related, though their goals may differ. The type of transformations used in audio and general time series are sometimes different (i.e. dB, Mel, …). For instance, in time series forecasting, transforms are usually expected to respect the time direction, and only consume past information for future value, as in “online” consumption of audio waveform.
@nairbv @zdevito @kingjr @adefossez @gully – do you have use cases for time series that could relate to torchaudio?
Additional context
- Python pandas
- R tidyverse time series
- Prophet
- GPyTorch for PyTorch and Gaussian Processes
- Internal doc streaming (internal doc)
Issue Analytics
- State:
- Created 4 years ago
- Reactions:2
- Comments:5 (2 by maintainers)
Top GitHub Comments
For non-audio applications (e.g. in finance) I could imagine a number of useful features/functions.
I’m not familiar with audio time series requirements, but similar to what @gully describes above, there are a number of ways time series of financial data can be represented that may be broadly applicable:
Ideally a time-series representation should be flexible/abstract enough so that other representations can be added easily. Tools that convert representations of the data could be useful.
Some other functionality that could be useful in time series tools, at least if applied to certain financial problems:
Here are some perspectives on timeseries from the astronomy data perspective.
The astropy project had a discussion on tradeoffs surrounding a TimeSeries class for astronomical applications in their ongoing Proposal for Enhancement. There are some subtle discussions distinguishing two types of time series:
The distinction essentially comes down to sparsity-- populating zeros in between infrequent/discrete events is wasteful.
Here at the NASA Kepler/K2 Guest Observer Office we focus on high-precision flux time series: the brightness of a star measured every 30 minutes for four years, with a quarterly gaps for transmitting the telescope data back to Earth. You can see that this acquisition rate yields a modest amount of data by the standards of audio: our “impressive” 70,000 time samples is acquired in under 2 seconds of single channel 44.1 kHz audio.
Some other distinctions: our time series data come with metadata headers that are generally preserved in our objects. Each time sample possesses columns (multichannels) of mixed data types: time, flux (float), flux uncertainty, quality flag (int), quality mask (bool), sky coordinate xy movement. Our in-house toolkit lightkurve deals with this time series data, with tons of application-specific pre-processing steps that wouldn’t matter much for a general time series class. The name nods to the convention of “light curves” rather than the audio-familiar waveforms.
We do frequency-domain analysis with FFTs all the time with some slight differences: we use an algorithm that can support unevenly sampled time spacings. We occasionally do spectrogram analysis, but you can see that a 70,000 sample signal can only be cut into 175 bins of
nfft=400
, which makes for a crude spectrogram.Astronomers use scalable Gaussian Process analysis all the time. Popular frameworks are tailored towards 1D time series astronomy, but could (and should?) apply more broadly to time series applications that care about uncertainty quantification or probabilistic prediction. The GPyTorch framework is promising, and I aspire to create astronomy-specific demos to advertise this library more widely to astronomers. The fixed time sample size of audio makes it amenable to some of the geometric assumptions of GPyTorch.
Those are some thoughts for now. Very curious to see how these themes evolve!