Slow loading of MonoCut created from multi-channel Recording
See original GitHub issueConsider the situation where I have an 8-channel wav file (e.g., in AliMeeting). I create a Recording
object from this file, and create several MonoCut
objects for different supervisions on channel 0 of the recording (similar to “SDM” setting in AMI). Now, if I need to load audio for the cuts (for instance, to extract features), it is much slower than if the audio was originally single-channel.
For example, on the AliMeeting train set, computing features using compute_and_store_features_batch()
takes approx. 2h for IHM data (which is single-channel recordings) vs. 14h for SDM data (8-channel recordings).
This issue is already noted in the comment here, but I just wanted to raise this explicitly to invite ideas about whether something could be done for selectively reading channels from the AudioSource.
Issue Analytics
- State:
- Created 9 months ago
- Comments:5
Top GitHub Comments
In these cases just use a dataloader with num workers > 0 and an unsupervised waveform dataset.
I don’t think disk IO is a bottleneck here. I have found the method quite fast when computing features for data which contain 1 utterance per recording (like LibriSpeech).