question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

KeyError while using a custom feature extractor

See original GitHub issue

Hi, I have prepared a custom feature extractor class and defined its name attribute as 'asrlib-extractor'. The features cuts are produced without any error, but when trying to train model in icefall, I get the following error:

Traceback (most recent call last):
  File "/home/gzi/.local/lib/python3.7/site-packages/lhotse/utils.py", line 660, in wrapper
    return fn(*args, **kwargs)
  File "/home/gzi/.local/lib/python3.7/site-packages/lhotse/cut.py", line 2963, in load_features
    reference_cut.features.type
  File "/home/gzi/.local/lib/python3.7/site-packages/lhotse/features/base.py", line 353, in create_default_feature_extractor
    return get_extractor_type(name)()
  File "/home/gzi/.local/lib/python3.7/site-packages/lhotse/features/base.py", line 343, in get_extractor_type
    return FEATURE_EXTRACTORS[name]
KeyError: 'asrlib-extractor'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./conv_emformer_ctc2/train.py", line 1318, in <module>
    main()
  File "./conv_emformer_ctc2/train.py", line 1311, in main
    run(rank=0, world_size=1, args=args)
  File "./conv_emformer_ctc2/train.py", line 1201, in run
    params=params,
  File "./conv_emformer_ctc2/train.py", line 1272, in scan_pessimistic_batches_for_oom
    batch = train_dl.dataset[cuts]
  File "/home/gzi/.local/lib/python3.7/site-packages/lhotse/dataset/speech_recognition.py", line 113, in __getitem__
    input_tpl = self.input_strategy(cuts)
  File "/home/gzi/.local/lib/python3.7/site-packages/lhotse/dataset/input_strategies.py", line 122, in __call__
    executor=_get_executor(self.num_workers, executor_type=self._executor_type),
  File "/home/gzi/.local/lib/python3.7/site-packages/lhotse/dataset/collation.py", line 138, in collate_features
      features[idx] = _read_features(cut)
  File "/home/gzi/.local/lib/python3.7/site-packages/lhotse/dataset/collation.py", line 514, in _read_features
    return torch.from_numpy(cut.load_features())
  File "/home/gzi/.local/lib/python3.7/site-packages/lhotse/utils.py", line 663, in wrapper
    f"{e}\n[extra info] When calling: {fn.__qualname__}(args={args} kwargs={kwargs})"
KeyError: "'asrlib-extractor'\n[extra info] When calling: MixedCut.load_features(args=(MixedCut(id='972a8469-1641-9f82-8b9d-2434e465e150', tracks=[MixTrack(cut=MonoCut(id='302-123516-0013-5363-0_sp0.9', start=0.0, duration=18.5166875, channel=0, supervisions=[SupervisionSegment(id='302-123516-0013_sp0.9', recording_id='302-123516-0013_sp0.9', start=0.0, duration=18.5166875, channel=0, text='WITH EMPHASIS AND DIGNITY IF AT ALL ROARED DAK KOVA BY THE DEAD HANDS AT MY THROAT BUT HE SHALL DIE BAR COMAS NO MAUDLIN WEAKNESS ON YOUR PART SHALL SAVE HIM', language='English', speaker='302', gender=None, custom=None, alignment=None)], features=Features(type='asrlib-extractor', num_frames=1852, num_features=40, frame_shift=0.01, sampling_rate=16000, start=0.0, duration=18.5166875, storage_type='lilcom_chunky', storage_path='data/fbank/feats_train-clean-100.lca', storage_key='1699845359,21386,19719,19004,13941', recording_id='None', channels=0), recording=Recording(id='302-123516-0013_sp0.9', sources=[AudioSource(type='file', channels=[0], source='/workspace/gzi/train-lvcsr-icefall-multi/download/LibriSpeech/train-clean-100/302/123516/302-123516-0013.flac')], sampling_rate=16000, num_samples=296267, duration=18.5166875, transforms=[{'name': 'Speed', 'kwargs': {'factor': 0.9}}]), custom=None), offset=0.0, snr=None), MixTrack(cut=MonoCut(id='af3c83ef-40b7-413a-8861-5f4eb14e3812', start=240.0, duration=10.0, channel=0, supervisions=[], features=Features(type='asrlib-extractor', num_frames=1000, num_features=40, frame_shift=0.01, sampling_rate=16000, start=240.0, duration=10.0, storage_type='lilcom_chunky', storage_path='data/fbank/musan_feats.lca', storage_key='686759667,20361,18819', recording_id='None', channels=0), recording=Recording(id='speech-librivox-0068', sources=[AudioSource(type='file', channels=[0], source='/workspace/gzi/train-lvcsr-icefall-multi/download/musan/speech/librivox/speech-librivox-0068.wav')], sampling_rate=16000, num_samples=4054622, duration=253.413875, transforms=None), custom=None), offset=0.0, snr=12.448918538034762), MixTrack(cut=MonoCut(id='95348514-2da9-4425-80a2-1491b5c110cb', start=10.0, duration=9.985, channel=0, supervisions=[], features=Features(type='asrlib-extractor', num_frames=1000, num_features=40, frame_shift=0.01, sampling_rate=16000, start=10.0, duration=10.0, storage_type='lilcom_chunky', storage_path='data/fbank/musan_feats.lca', storage_key='345449402,20204,18109', recording_id='None', channels=0), recording=Recording(id='music-jamendo-0035', sources=[AudioSource(type='file', channels=[0], source='/workspace/gzi/train-lvcsr-icefall-multi/download/musan/music/jamendo/music-jamendo-0035.wav')], sampling_rate=16000, num_samples=3781680, duration=236.355, transforms=None), custom=None), offset=10.0, snr=12.448918538034762), MixTrack(cut=PaddingCut(id='bd9c66b3-ad3c-2d6d-1a3d-1fa7bc8960a9', duration=0.005, sampling_rate=16000, feat_value=-23.025850929940457, num_frames=0, num_features=40, frame_shift=0.01, num_samples=80, custom=None), offset=19.985, snr=None)]),) kwargs={})"

This is the initial part of my feature extrator:

@dataclass
class HsaFeatureExtractorConfig:
    frame_len: Seconds = 0.025
    frame_shift: Seconds = 0.01

@register_extractor
class HsaFeatureExtractor(FeatureExtractor):
    """
    A FeatureEctractor class to extract HSA style features.
    """
    name = 'asrlib-extractor'
    config_type = HsaFeatureExtractorConfig
    def __init__(self):
...

I appreciate any help regarding this.

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
mohsen-goodarzicommented, Oct 31, 2022

Thank you @csukuangfj . Your solution solved KeyError. Now I am facing a new error, but I think it is not related to lhotse anymore.

0reactions
csukuangfjcommented, Oct 31, 2022

You can add the following line into train.py:

from xxx import HsaFeatureExtractor

where xxx.py is the file that you added HsaFeatureExtractor.

In this way, it let lhotse know that you have registered a new extractor. Otherwise, lhotse won’t be aware that there is a new extractor.

Read more comments on GitHub >

github_iconTop Results From Across the Web

KeyError while using sklearn.feature_extraction.text
It seems that 'age' is the data frame index (and not a column) so you can't access it by df['age']. You can just...
Read more >
Python KeyError Exceptions and How to Handle Them
Python's official documentation says that the KeyError is raised when a mapping key is accessed and isn't found in the mapping.
Read more >
What is KeyError in Python? Dictionary and Handling Them
Here I am trying to access a key called “D” which is not present in the dictionary. Hence, the error is thrown as...
Read more >
KeyError when getting features from a genbank file with ...
Good question. feature.qualifiers is a dict, if the dict doesn't have that key it will throw a KeyError. The way your code works...
Read more >
Training a new model using cli throws error KeyError #3523
I am trying to train a new spacy model based on the Tweebank annotated data. For that I first tried using the training...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found