KeyError while using a custom feature extractor
See original GitHub issueHi, I have prepared a custom feature extractor class and defined its name
attribute as 'asrlib-extractor'
. The features cuts are produced without any error, but when trying to train model in icefall, I get the following error:
Traceback (most recent call last):
File "/home/gzi/.local/lib/python3.7/site-packages/lhotse/utils.py", line 660, in wrapper
return fn(*args, **kwargs)
File "/home/gzi/.local/lib/python3.7/site-packages/lhotse/cut.py", line 2963, in load_features
reference_cut.features.type
File "/home/gzi/.local/lib/python3.7/site-packages/lhotse/features/base.py", line 353, in create_default_feature_extractor
return get_extractor_type(name)()
File "/home/gzi/.local/lib/python3.7/site-packages/lhotse/features/base.py", line 343, in get_extractor_type
return FEATURE_EXTRACTORS[name]
KeyError: 'asrlib-extractor'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./conv_emformer_ctc2/train.py", line 1318, in <module>
main()
File "./conv_emformer_ctc2/train.py", line 1311, in main
run(rank=0, world_size=1, args=args)
File "./conv_emformer_ctc2/train.py", line 1201, in run
params=params,
File "./conv_emformer_ctc2/train.py", line 1272, in scan_pessimistic_batches_for_oom
batch = train_dl.dataset[cuts]
File "/home/gzi/.local/lib/python3.7/site-packages/lhotse/dataset/speech_recognition.py", line 113, in __getitem__
input_tpl = self.input_strategy(cuts)
File "/home/gzi/.local/lib/python3.7/site-packages/lhotse/dataset/input_strategies.py", line 122, in __call__
executor=_get_executor(self.num_workers, executor_type=self._executor_type),
File "/home/gzi/.local/lib/python3.7/site-packages/lhotse/dataset/collation.py", line 138, in collate_features
features[idx] = _read_features(cut)
File "/home/gzi/.local/lib/python3.7/site-packages/lhotse/dataset/collation.py", line 514, in _read_features
return torch.from_numpy(cut.load_features())
File "/home/gzi/.local/lib/python3.7/site-packages/lhotse/utils.py", line 663, in wrapper
f"{e}\n[extra info] When calling: {fn.__qualname__}(args={args} kwargs={kwargs})"
KeyError: "'asrlib-extractor'\n[extra info] When calling: MixedCut.load_features(args=(MixedCut(id='972a8469-1641-9f82-8b9d-2434e465e150', tracks=[MixTrack(cut=MonoCut(id='302-123516-0013-5363-0_sp0.9', start=0.0, duration=18.5166875, channel=0, supervisions=[SupervisionSegment(id='302-123516-0013_sp0.9', recording_id='302-123516-0013_sp0.9', start=0.0, duration=18.5166875, channel=0, text='WITH EMPHASIS AND DIGNITY IF AT ALL ROARED DAK KOVA BY THE DEAD HANDS AT MY THROAT BUT HE SHALL DIE BAR COMAS NO MAUDLIN WEAKNESS ON YOUR PART SHALL SAVE HIM', language='English', speaker='302', gender=None, custom=None, alignment=None)], features=Features(type='asrlib-extractor', num_frames=1852, num_features=40, frame_shift=0.01, sampling_rate=16000, start=0.0, duration=18.5166875, storage_type='lilcom_chunky', storage_path='data/fbank/feats_train-clean-100.lca', storage_key='1699845359,21386,19719,19004,13941', recording_id='None', channels=0), recording=Recording(id='302-123516-0013_sp0.9', sources=[AudioSource(type='file', channels=[0], source='/workspace/gzi/train-lvcsr-icefall-multi/download/LibriSpeech/train-clean-100/302/123516/302-123516-0013.flac')], sampling_rate=16000, num_samples=296267, duration=18.5166875, transforms=[{'name': 'Speed', 'kwargs': {'factor': 0.9}}]), custom=None), offset=0.0, snr=None), MixTrack(cut=MonoCut(id='af3c83ef-40b7-413a-8861-5f4eb14e3812', start=240.0, duration=10.0, channel=0, supervisions=[], features=Features(type='asrlib-extractor', num_frames=1000, num_features=40, frame_shift=0.01, sampling_rate=16000, start=240.0, duration=10.0, storage_type='lilcom_chunky', storage_path='data/fbank/musan_feats.lca', storage_key='686759667,20361,18819', recording_id='None', channels=0), recording=Recording(id='speech-librivox-0068', sources=[AudioSource(type='file', channels=[0], source='/workspace/gzi/train-lvcsr-icefall-multi/download/musan/speech/librivox/speech-librivox-0068.wav')], sampling_rate=16000, num_samples=4054622, duration=253.413875, transforms=None), custom=None), offset=0.0, snr=12.448918538034762), MixTrack(cut=MonoCut(id='95348514-2da9-4425-80a2-1491b5c110cb', start=10.0, duration=9.985, channel=0, supervisions=[], features=Features(type='asrlib-extractor', num_frames=1000, num_features=40, frame_shift=0.01, sampling_rate=16000, start=10.0, duration=10.0, storage_type='lilcom_chunky', storage_path='data/fbank/musan_feats.lca', storage_key='345449402,20204,18109', recording_id='None', channels=0), recording=Recording(id='music-jamendo-0035', sources=[AudioSource(type='file', channels=[0], source='/workspace/gzi/train-lvcsr-icefall-multi/download/musan/music/jamendo/music-jamendo-0035.wav')], sampling_rate=16000, num_samples=3781680, duration=236.355, transforms=None), custom=None), offset=10.0, snr=12.448918538034762), MixTrack(cut=PaddingCut(id='bd9c66b3-ad3c-2d6d-1a3d-1fa7bc8960a9', duration=0.005, sampling_rate=16000, feat_value=-23.025850929940457, num_frames=0, num_features=40, frame_shift=0.01, num_samples=80, custom=None), offset=19.985, snr=None)]),) kwargs={})"
This is the initial part of my feature extrator:
@dataclass
class HsaFeatureExtractorConfig:
frame_len: Seconds = 0.025
frame_shift: Seconds = 0.01
@register_extractor
class HsaFeatureExtractor(FeatureExtractor):
"""
A FeatureEctractor class to extract HSA style features.
"""
name = 'asrlib-extractor'
config_type = HsaFeatureExtractorConfig
def __init__(self):
...
I appreciate any help regarding this.
Issue Analytics
- State:
- Created a year ago
- Comments:6 (6 by maintainers)
Top Results From Across the Web
KeyError while using sklearn.feature_extraction.text
It seems that 'age' is the data frame index (and not a column) so you can't access it by df['age']. You can just...
Read more >Python KeyError Exceptions and How to Handle Them
Python's official documentation says that the KeyError is raised when a mapping key is accessed and isn't found in the mapping.
Read more >What is KeyError in Python? Dictionary and Handling Them
Here I am trying to access a key called “D” which is not present in the dictionary. Hence, the error is thrown as...
Read more >KeyError when getting features from a genbank file with ...
Good question. feature.qualifiers is a dict, if the dict doesn't have that key it will throw a KeyError. The way your code works...
Read more >Training a new model using cli throws error KeyError #3523
I am trying to train a new spacy model based on the Tweebank annotated data. For that I first tried using the training...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thank you @csukuangfj . Your solution solved KeyError. Now I am facing a new error, but I think it is not related to lhotse anymore.
You can add the following line into
train.py
:where
xxx.py
is the file that you addedHsaFeatureExtractor
.In this way, it let lhotse know that you have registered a new extractor. Otherwise, lhotse won’t be aware that there is a new extractor.