question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error in decompressing LilcomChunkyWriter feature manifest

See original GitHub issue

I used kaldifeat to extract some features and stored them using the default storage type, which is LilcomChunkyWriter, but it seemed to be throwing some errors at the time of data loading:

Traceback (most recent call last):
  File "/home/draj/anaconda3/envs/scale/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/draj/anaconda3/envs/scale/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 46, in fetch
    data = self.dataset[possibly_batched_index]
  File "/export/c07/draj/mini_scale_2022/lhotse/lhotse/dataset/speech_recognition.py", line 113, in __getitem__
    input_tpl = self.input_strategy(cuts)
  File "/export/c07/draj/mini_scale_2022/lhotse/lhotse/dataset/input_strategies.py", line 120, in __call__
    return collate_features(
  File "/export/c07/draj/mini_scale_2022/lhotse/lhotse/dataset/collation.py", line 138, in collate_features
    features[idx] = _read_features(cut)
  File "/export/c07/draj/mini_scale_2022/lhotse/lhotse/dataset/collation.py", line 477, in _read_features
    return torch.from_numpy(cut.load_features())
  File "/export/c07/draj/mini_scale_2022/lhotse/lhotse/utils.py", line 632, in wrapper
    raise type(e)(
ValueError: Something went wrong in decompression (likely bad data): decompress_float returned 7
[extra info] When calling: MonoCut.load_features(args=(MonoCut(id='0fc802cd6f15cb7e6324709659edd6e7_109-60-0', start=0, duration=11.58, channel=0, supervisions=[SupervisionSegment(id='0fc802cd6f15cb7e6324709659edd6e7_109', recording_id='0fc802cd6f15cb7e6324709659edd6e7_109', start=0, duration=11.58, channel=0, text='as the tenure of the leadership team has increased we have been able to initiate positive changes throughout our business structure that have directly contributed to our recent successes', language='English', speaker='0fc802cd6f15cb7e6324709659edd6e7', gender=None, custom=None, alignment=None)], features=Features(type=kaldifeat-fbank, num_frames=1158, num_features=80, frame_shift=0.01, sampling_rate=16000, start=0, duration=11.58, storage_type=lilcom_chunky, storage_path=data/fbank/feats_dev.lca, storage_key=5047284,44760,44501,14302, recording_id=None, channels=0), recording=Recording(id='0fc802cd6f15cb7e6324709659edd6e7_109', sources=[AudioSource(type=file, channels=[0], source=/export/c07/draj/mini_scale_2022/icefall/egs/spgispeech/ASR/download/spgispeech/spgispeech/train/0fc802cd6f15cb7e6324709659edd6e7/109.wav)], sampling_rate=16000, num_samples=185280, duration=11.58, transforms=None), custom=None),) kwargs={})

When I switched to using LilcomHdf5Writer, the data loading was successful.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:13 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
danpoveycommented, Jun 7, 2022

@luomingshuang if you want just a quick fix for this, I think setting num_workers=0 in your asr_datamodule.py works. It’s some kind of threading bug.

1reaction
desh2608commented, Mar 15, 2022

Sorry I didn’t get time to get back to this, but it must have been a file corruption as you mention. You can close this issue for now. If I run into the error again, I’ll reopen it.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Error in decompressing LilcomChunkyWriter feature manifest
I used kaldifeat to extract some features and stored them using the default storage type, which is LilcomChunkyWriter, b...
Read more >
Request: Axios - Simple-Icons/Simple-Icons - IssueHint
Error in decompressing LilcomChunkyWriter feature manifest, 13, 2022-03-02 ; require is not supported by ESM, 1, 2022-02-08 ; Adding a sample_action method for ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found