question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

First cut and then extract features?

See original GitHub issue

EDIT: I misread the code. The part below creates the cuts from the manifest. The feature extraction happens afterwards as recommended by the docs, using this part:

cuts = cuts.compute_and_store_features(
    extractor=Fbank(),
    storage_path='feats',
    num_jobs=8
).pad(duration=5.0)


We retrieve the arrays by loading the whole feature matrix from disk and selecting the relevant region (e.g. specified by a cut). Therefore it makes sense to cut the recordings first, and then extract the features for them to avoid loading unnecessary data from disk (especially for very long recordings). from: https://lhotse.readthedocs.io/en/v0.6_g/features.html

But if I understand the part below from the ‘Getting started’ example correctly, it computes features first and then cuts it. Wouldn’t that mean that loading the features for those cuts will be less efficient because the features are as whole and not separate for each cut?

# CutSet is the workhorse of Lhotse, allowing for flexible data manipulation.
# We create 5-second cuts by traversing SWBD recordings in windows.
# No audio data is actually loaded into memory or stored to disk at this point.
cuts = CutSet.from_manifests(
    recordings=swbd['recordings'],
    supervisions=swbd['supervisions']
).cut_into_windows(duration=5)

I know that docs are not complete yet, that’s why I wanted to ask 😃

Thanks in advance!

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
LasseWoltercommented, Feb 7, 2022

Perfect, thanks a lot for your advice. I’m working on a laughter detector for my Bachelor project and really appreciate your work and helpful support.

Thanks a lot.

1reaction
pzelaskocommented, Feb 7, 2022

Yes, it will still be efficient.

I’d say if your training examples are fixed (eg ASR training, cut == supervision) then first cut, then extract features. If your training examples are dynamic (eg you’re sampling chunks for self supervised training or VAD etc) then it’s definitely better to extract, then cut.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Feature Extraction Techniques - Towards Data Science
Feature Extraction aims to reduce the number of features in a dataset by creating new features from the existing ones (and then discarding...
Read more >
Machine Learning 101: Feature Extraction - YouTube
Have you always been curious about what machine learning can do for your business problem, but could never find the time to learn...
Read more >
Extract Features from Image using Pretrained Model | Python
Content Description ⭐️In this video, I have explained on how to extract features from the image using a pretrained model.
Read more >
The Computer Vision Pipeline, Part 4: feature extraction
Therefore, the first step after preprocessing the image is to simplify the image by extracting the important information and throwing away non- ...
Read more >
Text feature extraction based on deep learning: a review
This thesis outlines the common methods used in text feature extraction first, and then expands frequently used deep learning methods in ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found