Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Difficulty seeing meaningful changes with hotword boosting

See original GitHub issue

I am trying to test hotword boosting on a model meant to diagnose pronunciation mistakes, so the tokens are in IPA (international phonetic alphabet), but otherwise everything should work the same.

I have two related issues.

I’m having trouble getting the hotword to change the result at all, even when using insane hotword weights like 9999999.0. Any ideas why this might be happening?
I can occasionally get the result to change, but I have an example below where the inclusion of a hotword changes a word in the result, but it doesn’t output the hotword. Model output before CTCDecode: ðɪs wɪl bi dɪskʌst wɪð ɪndʌstɹi (this will be discussed with industry) Hotword used: dɪskʌsd (changing t for d) Model output after CTCDecode: ðɪs wɪl bi dɪskʌs wɪð ɪndʌstɹi (the t at the end of ‘dɪskʌs’ disappears)

I didn’t think this was possible based on how hotword boosting works? Am I misunderstanding or is this potentially a bug?

Env info

pyctcdecode 0.1.0
numpy 1.21.0
Non BPE model
No LM

Code


# Change from 1 x classes x lengths to length x classes
probabilities = probabilities.transpose(1, 2).squeeze(0)
decoder = build_ctcdecoder(labels)
hotwords = ["wɪd", "dɪskʌsd"]
text = decoder.decode(probabilities.detach().numpy(), hotwords=hotwords, hotword_weight=1000.0)

print(text)

Issue Analytics

State:
Created 2 years ago
Comments:9 (5 by maintainers)

Top GitHub Comments

1reaction

rbraccocommented, Jul 12, 2021

Thanks, this will at least give me some rabbit holes to go down and see if I can tune a decent decoder myself.

1reaction

poneillcommented, Jul 12, 2021

It may take some futzing with the defaults in order to see good performance for any given use case. For example, if I run:

text = decoder.decode(
    probabilities,
    hotwords=hotwords,
    hotword_weight=100,
    beam_prune_logp=-100,
    token_min_logp=-10
)

I get:

ðɪs wɪd bi dɪskʌsd wɪd ɪndʌstɹi

which is, if not a great decoding, hopefully at least evidence that the hotwords feature is working as intended.

It may be that the chosen defaults for beam_prune_logp and token_min_logp should be different when the user submits hotwords, but it’s hard to tell from a single example. Ideally the user would perform a hyperparameter search in order to tune the decoder to their use case. I’m not opposed to adding a convenience function to that effect, provided that we can cover most of what people expect out of such a function, something like:

decoder = pyctcdecode.build_and_tune_decoder_from(
    train_logit_matrices,
    train_transcriptions, 
    alphabet, 
    possible_hotwords, 
    metric='wer',
    tuning_iterations=100
)

@gkucsko wdyt?