Difficulty seeing meaningful changes with hotword boosting
See original GitHub issueI am trying to test hotword boosting on a model meant to diagnose pronunciation mistakes, so the tokens are in IPA (international phonetic alphabet), but otherwise everything should work the same.
I have two related issues.
- I’m having trouble getting the hotword to change the result at all, even when using insane hotword weights like 9999999.0. Any ideas why this might be happening?
- I can occasionally get the result to change, but I have an example below where the inclusion of a hotword changes a word in the result, but it doesn’t output the hotword.
Model output before CTCDecode:
ðɪs wɪl bi dɪskʌst wɪð ɪndʌstɹi
(this will be discussed with industry) Hotword used:dɪskʌsd
(changing t for d) Model output after CTCDecode:ðɪs wɪl bi dɪskʌs wɪð ɪndʌstɹi
(the t at the end of ‘dɪskʌs’ disappears)
I didn’t think this was possible based on how hotword boosting works? Am I misunderstanding or is this potentially a bug?
Env info
pyctcdecode 0.1.0
numpy 1.21.0
Non BPE model
No LM
Code
# Change from 1 x classes x lengths to length x classes
probabilities = probabilities.transpose(1, 2).squeeze(0)
decoder = build_ctcdecoder(labels)
hotwords = ["wɪd", "dɪskʌsd"]
text = decoder.decode(probabilities.detach().numpy(), hotwords=hotwords, hotword_weight=1000.0)
print(text)
Issue Analytics
- State:
- Created 2 years ago
- Comments:9 (5 by maintainers)
Top Results From Across the Web
ML for Audio Study Group - pyctcdecode (Jan 18)
A user is having issues where he's not seeing meaningful differences when using hotwords, even if upweighting the words to a very large...
Read more >"OK Google" hotword detection is now broken on WearOS ...
If an average customer has a software problem with his Android ... On phone, hotword detection rarely works, sometimes it has a significant...
Read more >arXiv:2206.09790v1 [cs.CL] 20 Jun 2022
We show how hotword boosting can improve key- word detection in a COVID-19 use case-based ra- dio monitoring system and evaluate the model's....
Read more >Hey Siri: An On-device DNN-powered Voice Trigger for ...
We built in some flexibility to make it easier to activate Siri in difficult conditions while not significantly increasing the number of false...
Read more >Energy Efficient Hotword Detection through Accelerometer
One of the most difficult challenges in accurate hotword detection is that user's mobility causes significant changes in accelerometer data.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thanks, this will at least give me some rabbit holes to go down and see if I can tune a decent decoder myself.
It may take some futzing with the defaults in order to see good performance for any given use case. For example, if I run:
I get:
ðɪs wɪd bi dɪskʌsd wɪd ɪndʌstɹi
which is, if not a great decoding, hopefully at least evidence that the hotwords feature is working as intended.
It may be that the chosen defaults for
beam_prune_logp
andtoken_min_logp
should be different when the user submits hotwords, but it’s hard to tell from a single example. Ideally the user would perform a hyperparameter search in order to tune the decoder to their use case. I’m not opposed to adding a convenience function to that effect, provided that we can cover most of what people expect out of such a function, something like:@gkucsko wdyt?