Receiving a continous stream of ResultReason::NoMatch recognition results
See original GitHub issueDescribe the bug This happens frequently, but not on every call.
- I start streaming recognition by calling m_recognizer->StartContinuousRecognitionAsync()
- I immediately (same second) get my onSpeechEndDetected handler invoked (signaling end of speech)
- I then immediately get a recognition result with “RecognitionStatus”:“InitialSilenceTimeout”.
Note: at this point, less than one second has passed since calling StartContinuousRecognitionAsync().
Also: I did not set PropertyId::SpeechServiceConnection_InitialSilenceTimeoutMs to any value
-
Since I did not get a result, I immediately restart recogition. To do that, I a) call m_recognizer->StopContinuousRecognitionAsync().get(); /* to stop the current recognition */ b) create a new m_recognizer and call m_recognizer->StartContinuousRecognitionAsync()
-
Again, I immediately get the same two results: a) onSpeechEndDetected b) recognition result with “RecognitionStatus”:“InitialSilenceTimeout”.
I restart again…and this continues on. I cycle through steps 4/5 above many times per second – me trying to start recognition and instantly getting onSpeechEndDetected/InitialSilenceTimeout. This can go on for 20 seconds, until I abandon the effort altogether.
A more detailed log is shown below.
So my questions are:
- Is this a bug? has it been seen before?
- Am I doing something wrong in my sequence above? I have been assuming that when I get a recognition event with ResultReason::RecognizedSpeech and no transcript that I must then restart recognition, as I am doing?
- Does this have to do with me not setting the SpeechServiceConnection_InitialSilenceTimeoutMs to any value?
Any other thoughts or guidance appreciated. For now, Azure speech reco is not working in my product 😦.
To Reproduce See above
Expected behavior I did not expect to immediately get a recognition result with no speech detected over and over, less than 1 sec after starting recognition.
Version of the Cognitive Services Speech SDK 1.22.0
Platform, Operating System, and Programming Language
- OS: Debian 10
- Hardware - x64
- Programming language: C++
- Browser [e.g. Chrome, Safari] (if applicable) - N/A
Additional context
Here is an example log file showing how frequently this is happening. This is from my C++ program
2022-08-19 09:09:04.969604 start transcribing
2022-08-19 09:09:05.529595 responseHandler event azure_transcribe::end_of_utterance, body (null).
2022-08-19 09:09:05.549589 responseHandler event azure_transcribe::no_speech_detected, body {"Id":"5d690f57895c421a8a03bc00224994cf","RecognitionStatus":"InitialSilenceTimeout","Offset":41400000,"Duration":8600000}.
2022-08-19 09:09:05.589592 start transcribing en-US
2022-08-19 09:09:05.649594 azure_transcribe::end_of_utterance, body (null).
2022-08-19 09:09:05.649594 responseHandler event azure_transcribe::no_speech_detected, body {"Id":"62a0ed7d63784e25b774b33da1af3332","RecognitionStatus":"InitialSilenceTimeout","Offset":0,"Duration":3000000}.
2022-08-19 09:09:05.729590 start transcribing
2022-08-19 09:09:05.809594 responseHandler event azure_transcribe::end_of_utterance, body (null).
2022-08-19 09:09:05.809594 responseHandler event azure_transcribe::no_speech_detected, body {"Id":"6c5dfd3a4b534e288bccdc8d6e74c7e9","RecognitionStatus":"InitialSilenceTimeout","Offset":0,"Duration":1000000}.
2022-08-19 09:09:05.869595 start transcribing
2022-08-19 09:09:05.949602 responseHandler event azure_transcribe::end_of_utterance, body (null).
2022-08-19 09:09:05.949602 responseHandler event azure_transcribe::no_speech_detected, body {"Id":"539ca4da416040318382680de1ebec2b","RecognitionStatus":"InitialSilenceTimeout","Offset":0,"Duration":1000000}.
2022-08-19 09:09:06.009591 start transcribing en-US complete
2022-08-19 09:09:06.089599 responseHandler event azure_transcribe::end_of_utterance, body (null).
2022-08-19 09:09:06.089599 responseHandler event azure_transcribe::no_speech_detected, body {"Id":"3e1559e84f7148b3986cfaca4472074a","RecognitionStatus":"InitialSilenceTimeout","Offset":0,"Duration":1000000}.
2022-08-19 09:09:06.129661 [mod_azure_transcribe.c:147 start transcribing
2022-08-19 09:09:06.210747 responseHandler event azure_transcribe::end_of_utterance, body (null).
2022-08-19 09:09:06.210747 responseHandler event azure_transcribe::no_speech_detected, body {"Id":"5af4abdc5a104a3884ad0fcd6132fdaf","RecognitionStatus":"InitialSilenceTimeout","Offset":0,"Duration":1000000}.
Issue Analytics
- State:
- Created a year ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
@davehorton When you use continuous recognition, you only need to end recognition
CancellationReason::EndOfStream
),CancellationReason::Error
).You do not need to (and should not) call
StopContinuousRecognitionAsync
after everyRecognized
event and start anew. WithStartContinuousRecognitionAsync
you start a recognition session. During a session there can be severalRecognized
events, with eitherResultReason::RecognizedSpeech
orResultReason::NoMatch
, meaning that you will receive recognition results one after another until the whole input (file) has been processed or recognition is stopped due to some other reason (ref. above). OneRecognized
event covers just one phrase (or max length of speech with no match, or a silence timeout period) from the input, and processing for more automatically continues.Here’s an example with file input (= recognition will be stopped at the latest when the end of file is reached). If you want to use microphone input, remove
AudioConfig
and e.g. wait for a keypress afterStartContinuousRecognitionAsync
.Closed as answered, please open a new issue with detailed info if more support is needed.