Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

First cut and then extract features?

See original GitHub issue

EDIT: I misread the code. The part below creates the cuts from the manifest. The feature extraction happens afterwards as recommended by the docs, using this part:

cuts = cuts.compute_and_store_features(
    extractor=Fbank(),
    storage_path='feats',
    num_jobs=8
).pad(duration=5.0)

We retrieve the arrays by loading the whole feature matrix from disk and selecting the relevant region (e.g. specified by a cut). Therefore it makes sense to cut the recordings first, and then extract the features for them to avoid loading unnecessary data from disk (especially for very long recordings). from: https://lhotse.readthedocs.io/en/v0.6_g/features.html

But if I understand the part below from the ‘Getting started’ example correctly, it computes features first and then cuts it. Wouldn’t that mean that loading the features for those cuts will be less efficient because the features are as whole and not separate for each cut?

# CutSet is the workhorse of Lhotse, allowing for flexible data manipulation.
# We create 5-second cuts by traversing SWBD recordings in windows.
# No audio data is actually loaded into memory or stored to disk at this point.
cuts = CutSet.from_manifests(
    recordings=swbd['recordings'],
    supervisions=swbd['supervisions']
).cut_into_windows(duration=5)

I know that docs are not complete yet, that’s why I wanted to ask 😃

Thanks in advance!

Issue Analytics

State:
Created 2 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

LasseWoltercommented, Feb 7, 2022

Perfect, thanks a lot for your advice. I’m working on a laughter detector for my Bachelor project and really appreciate your work and helpful support.

Thanks a lot.

1reaction

pzelaskocommented, Feb 7, 2022

Yes, it will still be efficient.

I’d say if your training examples are fixed (eg ASR training, cut == supervision) then first cut, then extract features. If your training examples are dynamic (eg you’re sampling chunks for self supervised training or VAD etc) then it’s definitely better to extract, then cut.