Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Feature: AudioCutting feature extractor

See original GitHub issue

Is it possible right now to use CutSet to cut all recordings into separate wav files and dump these files on disk? If not, would it be a considerable feature?

It could be used in a similar way to feature extraction.

recording_set = RecordingSet.from_yaml('audio.yml')
audio_cutter = AudioCutter()

with AudioWriter() as storage:
    builder = FeatureSetBuilder(feature_extractor=audio_cutter, storage=storage)
    feature_set = builder.process_and_store_recordings(
        recordings=recording_set,
        num_jobs=8
    )

or the cutset would simply have a function CutSet.cut_audio(path) or similar. The audio cuts could be stored in subfolders of path based eg on recording name. Along with the wav cuts, there would also be a manifest file for the Cuts. Let me know if this would be of interest or if something similar is already possible. Thanks

Issue Analytics

State:
Created 3 years ago
Comments:6 (2 by maintainers)

Top GitHub Comments

1reaction

janvainercommented, Dec 26, 2020

Agree, sorry for late response. I was quite busy before Christmas. I can give it a try, but probably in a few weeks from now.

1reaction

pzelaskocommented, Nov 25, 2020

I think we can mimic the approach we took in saving the feature matrices to files, i.e. use the first three symbols from the Cut ID as a name for the sub-directory, and store the new recording (with name == cut ID) in it.

Maybe we can save it as a FLAC rather than WAV to save some space - I think it’s supported by soundfile out-of-the-box.

So e.g. for cuts with IDs:

abc12345
abc34576
abd23452

we’d create the following structure:

You can find code that does a similar thing in the LilcomFilesWriter class.

On Nov 25, 2020, at 15:21, Jan Vainer notifications@github.com wrote:

@pzelasko https://github.com/pzelasko being able to use augmentations and using parallel execution is a really cool idea I think. Sure I will give it a try. Just a question - what should be the dir structure for saving? By recordings or perhaps by speakers? The latter would not work if the data had no speaker annotations. So by recording would probably make most sense. We could also do it bu Cuts though.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lhotse-speech/lhotse/issues/150#issuecomment-733928310, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADZRKQEFYVONRIL6AUYBL63SRVRNNANCNFSM4UA3CEDA.