Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Time series

See original GitHub issue

I’d like to reflect upon what functions would be desired (that could live within torchaudio or outside) in order to offer preliminary support for time series.

Time series data format (e.g. many channels compared to audio)
Missing data imputing
Interaction with calendar information
Conversion to other formats, say to audio waveform
Option on transformations to respect time direction
Streaming use case?

Could we make sure that our constructs are general enough to touch on time series, without sacrificing the primary goal of audio for this library?

Motivation

An audio (multichannel) waveform is a (vector) time series with constant time step whose length is given by sample_rate.

Audio processing and time series analysis are related, though their goals may differ. The type of transformations used in audio and general time series are sometimes different (i.e. dB, Mel, …). For instance, in time series forecasting, transforms are usually expected to respect the time direction, and only consume past information for future value, as in “online” consumption of audio waveform.

@nairbv @zdevito @kingjr @adefossez @gully – do you have use cases for time series that could relate to torchaudio?

Additional context

Python pandas
R tidyverse time series
Prophet
GPyTorch for PyTorch and Gaussian Processes
Internal doc streaming (internal doc)

Issue Analytics

State:
Created 4 years ago
Reactions:2
Comments:5 (2 by maintainers)

Top GitHub Comments

2reactions

nairbvcommented, Jan 18, 2020

For non-audio applications (e.g. in finance) I could imagine a number of useful features/functions.

I’m not familiar with audio time series requirements, but similar to what @gully describes above, there are a number of ways time series of financial data can be represented that may be broadly applicable:

In raw trade or tick data, each data point discretely represents a trade or price change. Some data might include each change to the best bid and ask.
Tick data is typically aggregated into “candlesticks” as open (start), low, high, close (final), volume (total number of shares traded) per time period. Each period then ends up being represented as a vector of these five values.
Other approaches similar to the “sampled time series” described above by @gully would be open/low/close/high/duration per N trades or shares traded or ticks. There are a variety of approaches like this that can be used for summarizing “bars” of discrete financial time series data.

Ideally a time-series representation should be flexible/abstract enough so that other representations can be added easily. Tools that convert representations of the data could be useful.

Some other functionality that could be useful in time series tools, at least if applied to certain financial problems:

For “Interaction with calendar information,” it could be useful to have a way to “join” multiple time series from different sources.
- One may want to train a single model with data from multiple securities aligned on time.
- Maybe also useful for multi-modal models or stereo audio?
A way to augment time-series data with cumulative or moving averages, stdev, etc.
- Traders often augment their price data with a variety of derived metrics (bollinger bands, EMA, SMA, MACD, etc). I’m not sure if there are similar derived metrics from audio time series.
Forecasting data loaders that help deal with look-ahead or recency bias, maybe using sliding time windows?
- It’s easy to introduce look ahead bias, especially if trained online learning incrementally.
- One wouldn’t want to re-train a model from scratch with each new tick, but could use some kind of sampling method to incorporate new information while controlling or eliminating recency bias.
- Ways to preprocess the data during loading, e.g. to convert values to deltas or returns
Ways to test for and adjust for stationarity.
- One might want to normalize a return series with mean return, but need to use a cumulative or rolling historical mean to avoid look ahead bias.
Something for generating simplistic auto-regressive test time series could be useful (http://www.jessicayung.com/generating-autoregressive-data-for-experiments/)

1reaction

gullycommented, Oct 12, 2019

Here are some perspectives on timeseries from the astronomy data perspective.

The astropy project had a discussion on tradeoffs surrounding a TimeSeries class for astronomical applications in their ongoing Proposal for Enhancement. There are some subtle discussions distinguishing two types of time series:

sampled time series that sum up a count rate observed over a time interval, such as how many photons were received from a telescope sensor in a 30 minute interval
event data that are timestamps of discrete events, such as the energy of single proton measured at the instant of impinging a sensor.

The distinction essentially comes down to sparsity-- populating zeros in between infrequent/discrete events is wasteful.

Here at the NASA Kepler/K2 Guest Observer Office we focus on high-precision flux time series: the brightness of a star measured every 30 minutes for four years, with a quarterly gaps for transmitting the telescope data back to Earth. You can see that this acquisition rate yields a modest amount of data by the standards of audio: our “impressive” 70,000 time samples is acquired in under 2 seconds of single channel 44.1 kHz audio.

Some other distinctions: our time series data come with metadata headers that are generally preserved in our objects. Each time sample possesses columns (multichannels) of mixed data types: time, flux (float), flux uncertainty, quality flag (int), quality mask (bool), sky coordinate xy movement. Our in-house toolkit lightkurve deals with this time series data, with tons of application-specific pre-processing steps that wouldn’t matter much for a general time series class. The name nods to the convention of “light curves” rather than the audio-familiar waveforms.

We do frequency-domain analysis with FFTs all the time with some slight differences: we use an algorithm that can support unevenly sampled time spacings. We occasionally do spectrogram analysis, but you can see that a 70,000 sample signal can only be cut into 175 bins of nfft=400, which makes for a crude spectrogram.

Astronomers use scalable Gaussian Process analysis all the time. Popular frameworks are tailored towards 1D time series astronomy, but could (and should?) apply more broadly to time series applications that care about uncertainty quantification or probabilistic prediction. The GPyTorch framework is promising, and I aspire to create astronomy-specific demos to advertise this library more widely to astronomers. The fixed time sample size of audio makes it amenable to some of the geometric assumptions of GPyTorch.

Those are some thoughts for now. Very curious to see how these themes evolve!