First cut and then extract features?
See original GitHub issueEDIT: I misread the code. The part below creates the cuts from the manifest. The feature extraction happens afterwards as recommended by the docs, using this part:
cuts = cuts.compute_and_store_features(
extractor=Fbank(),
storage_path='feats',
num_jobs=8
).pad(duration=5.0)
We retrieve the arrays by loading the whole feature matrix from disk and selecting the relevant region (e.g. specified by a cut). Therefore it makes sense to cut the recordings first, and then extract the features for them to avoid loading unnecessary data from disk (especially for very long recordings). from: https://lhotse.readthedocs.io/en/v0.6_g/features.html
But if I understand the part below from the ‘Getting started’ example correctly, it computes features first and then cuts it. Wouldn’t that mean that loading the features for those cuts will be less efficient because the features are as whole and not separate for each cut?
# CutSet is the workhorse of Lhotse, allowing for flexible data manipulation.
# We create 5-second cuts by traversing SWBD recordings in windows.
# No audio data is actually loaded into memory or stored to disk at this point.
cuts = CutSet.from_manifests(
recordings=swbd['recordings'],
supervisions=swbd['supervisions']
).cut_into_windows(duration=5)
I know that docs are not complete yet, that’s why I wanted to ask 😃
Thanks in advance!
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
Perfect, thanks a lot for your advice. I’m working on a laughter detector for my Bachelor project and really appreciate your work and helpful support.
Thanks a lot.
Yes, it will still be efficient.
I’d say if your training examples are fixed (eg ASR training, cut == supervision) then first cut, then extract features. If your training examples are dynamic (eg you’re sampling chunks for self supervised training or VAD etc) then it’s definitely better to extract, then cut.