FeatureSet - scope
See original GitHub issueI’m thinking about the FeatureSet
, and I’m not sure what’s the scope of operations we’d like to support in lhotse. We will use lilcom to load/store the feature matrices, but what about feature extraction? Should we just use something precomputed e.g. with Kaldi, or also extract them on-the-fly at the FeatureSet
API level? If the second is true, we’ll either need to use some other library (e.g. librosa) or delegate feature extraction to Kaldi by running it as a subprocess (unless there are some Python bindings available). I guess the same questions apply to data augmentation (we’ll get to that after having something initial working for features and having some example dataset represented in lhotse).
Of course, having the whole data augmentation + feature extraction pipeline as a part of lhotse would be more convenient in the long run. It’ll just take longer to get there. @danpovey @jtrmal WDYT?
Issue Analytics
- State:
- Created 3 years ago
- Comments:10 (3 by maintainers)
Top GitHub Comments
Makes sense I guess (although we’d have to make sure the defaults were stable when we do the release).
It might make sense to support writing the manifest files compressed, as they could get large and should be highly compressible.
torchaudio
only useslibrosa
for running compatibility tests; they wrote their own (compatible) feature extraction routines as PyTorch jit-able modules (including deltas and sliding CMN). They seem to have implemented support for two backends for reading audio files (sox
andlibsoundfile
, the latter also works on Windows…) and are working on replacing sox effects with PyTorch versions (see https://github.com/pytorch/audio/issues/260 for a list of what they already implemented). I guess the point of that effort is to be able to use them on the fly during training.