Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error in decompressing LilcomChunkyWriter feature manifest

See original GitHub issue

I used kaldifeat to extract some features and stored them using the default storage type, which is LilcomChunkyWriter, but it seemed to be throwing some errors at the time of data loading:

Traceback (most recent call last):
  File "/home/draj/anaconda3/envs/scale/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/draj/anaconda3/envs/scale/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 46, in fetch
    data = self.dataset[possibly_batched_index]
  File "/export/c07/draj/mini_scale_2022/lhotse/lhotse/dataset/speech_recognition.py", line 113, in __getitem__
    input_tpl = self.input_strategy(cuts)
  File "/export/c07/draj/mini_scale_2022/lhotse/lhotse/dataset/input_strategies.py", line 120, in __call__
    return collate_features(
  File "/export/c07/draj/mini_scale_2022/lhotse/lhotse/dataset/collation.py", line 138, in collate_features
    features[idx] = _read_features(cut)
  File "/export/c07/draj/mini_scale_2022/lhotse/lhotse/dataset/collation.py", line 477, in _read_features
    return torch.from_numpy(cut.load_features())
  File "/export/c07/draj/mini_scale_2022/lhotse/lhotse/utils.py", line 632, in wrapper
    raise type(e)(
ValueError: Something went wrong in decompression (likely bad data): decompress_float returned 7
[extra info] When calling: MonoCut.load_features(args=(MonoCut(id='0fc802cd6f15cb7e6324709659edd6e7_109-60-0', start=0, duration=11.58, channel=0, supervisions=[SupervisionSegment(id='0fc802cd6f15cb7e6324709659edd6e7_109', recording_id='0fc802cd6f15cb7e6324709659edd6e7_109', start=0, duration=11.58, channel=0, text='as the tenure of the leadership team has increased we have been able to initiate positive changes throughout our business structure that have directly contributed to our recent successes', language='English', speaker='0fc802cd6f15cb7e6324709659edd6e7', gender=None, custom=None, alignment=None)], features=Features(type=kaldifeat-fbank, num_frames=1158, num_features=80, frame_shift=0.01, sampling_rate=16000, start=0, duration=11.58, storage_type=lilcom_chunky, storage_path=data/fbank/feats_dev.lca, storage_key=5047284,44760,44501,14302, recording_id=None, channels=0), recording=Recording(id='0fc802cd6f15cb7e6324709659edd6e7_109', sources=[AudioSource(type=file, channels=[0], source=/export/c07/draj/mini_scale_2022/icefall/egs/spgispeech/ASR/download/spgispeech/spgispeech/train/0fc802cd6f15cb7e6324709659edd6e7/109.wav)], sampling_rate=16000, num_samples=185280, duration=11.58, transforms=None), custom=None),) kwargs={})

When I switched to using LilcomHdf5Writer, the data loading was successful.

Issue Analytics

State:
Created 2 years ago
Comments:13 (5 by maintainers)

Top GitHub Comments

1reaction

danpoveycommented, Jun 7, 2022

@luomingshuang if you want just a quick fix for this, I think setting num_workers=0 in your asr_datamodule.py works. It’s some kind of threading bug.

1reaction

desh2608commented, Mar 15, 2022

Sorry I didn’t get time to get back to this, but it must have been a file corruption as you mention. You can close this issue for now. If I run into the error again, I’ll reopen it.