Error in decompressing LilcomChunkyWriter feature manifest
See original GitHub issueI used kaldifeat
to extract some features and stored them using the default storage type, which is LilcomChunkyWriter
, but it seemed to be throwing some errors at the time of data loading:
Traceback (most recent call last):
File "/home/draj/anaconda3/envs/scale/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
data = fetcher.fetch(index)
File "/home/draj/anaconda3/envs/scale/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 46, in fetch
data = self.dataset[possibly_batched_index]
File "/export/c07/draj/mini_scale_2022/lhotse/lhotse/dataset/speech_recognition.py", line 113, in __getitem__
input_tpl = self.input_strategy(cuts)
File "/export/c07/draj/mini_scale_2022/lhotse/lhotse/dataset/input_strategies.py", line 120, in __call__
return collate_features(
File "/export/c07/draj/mini_scale_2022/lhotse/lhotse/dataset/collation.py", line 138, in collate_features
features[idx] = _read_features(cut)
File "/export/c07/draj/mini_scale_2022/lhotse/lhotse/dataset/collation.py", line 477, in _read_features
return torch.from_numpy(cut.load_features())
File "/export/c07/draj/mini_scale_2022/lhotse/lhotse/utils.py", line 632, in wrapper
raise type(e)(
ValueError: Something went wrong in decompression (likely bad data): decompress_float returned 7
[extra info] When calling: MonoCut.load_features(args=(MonoCut(id='0fc802cd6f15cb7e6324709659edd6e7_109-60-0', start=0, duration=11.58, channel=0, supervisions=[SupervisionSegment(id='0fc802cd6f15cb7e6324709659edd6e7_109', recording_id='0fc802cd6f15cb7e6324709659edd6e7_109', start=0, duration=11.58, channel=0, text='as the tenure of the leadership team has increased we have been able to initiate positive changes throughout our business structure that have directly contributed to our recent successes', language='English', speaker='0fc802cd6f15cb7e6324709659edd6e7', gender=None, custom=None, alignment=None)], features=Features(type=kaldifeat-fbank, num_frames=1158, num_features=80, frame_shift=0.01, sampling_rate=16000, start=0, duration=11.58, storage_type=lilcom_chunky, storage_path=data/fbank/feats_dev.lca, storage_key=5047284,44760,44501,14302, recording_id=None, channels=0), recording=Recording(id='0fc802cd6f15cb7e6324709659edd6e7_109', sources=[AudioSource(type=file, channels=[0], source=/export/c07/draj/mini_scale_2022/icefall/egs/spgispeech/ASR/download/spgispeech/spgispeech/train/0fc802cd6f15cb7e6324709659edd6e7/109.wav)], sampling_rate=16000, num_samples=185280, duration=11.58, transforms=None), custom=None),) kwargs={})
When I switched to using LilcomHdf5Writer
, the data loading was successful.
Issue Analytics
- State:
- Created 2 years ago
- Comments:13 (5 by maintainers)
Top Results From Across the Web
Error in decompressing LilcomChunkyWriter feature manifest
I used kaldifeat to extract some features and stored them using the default storage type, which is LilcomChunkyWriter, b...
Read more >Request: Axios - Simple-Icons/Simple-Icons - IssueHint
Error in decompressing LilcomChunkyWriter feature manifest, 13, 2022-03-02 ; require is not supported by ESM, 1, 2022-02-08 ; Adding a sample_action method for ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@luomingshuang if you want just a quick fix for this, I think setting num_workers=0 in your asr_datamodule.py works. It’s some kind of threading bug.
Sorry I didn’t get time to get back to this, but it must have been a file corruption as you mention. You can close this issue for now. If I run into the error again, I’ll reopen it.