question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Receiving a continous stream of ResultReason::NoMatch recognition results

See original GitHub issue

Describe the bug This happens frequently, but not on every call.

  1. I start streaming recognition by calling m_recognizer->StartContinuousRecognitionAsync()
  2. I immediately (same second) get my onSpeechEndDetected handler invoked (signaling end of speech)
  3. I then immediately get a recognition result with “RecognitionStatus”:“InitialSilenceTimeout”.

Note: at this point, less than one second has passed since calling StartContinuousRecognitionAsync().
Also: I did not set PropertyId::SpeechServiceConnection_InitialSilenceTimeoutMs to any value

  1. Since I did not get a result, I immediately restart recogition. To do that, I a) call m_recognizer->StopContinuousRecognitionAsync().get(); /* to stop the current recognition */ b) create a new m_recognizer and call m_recognizer->StartContinuousRecognitionAsync()

  2. Again, I immediately get the same two results: a) onSpeechEndDetected b) recognition result with “RecognitionStatus”:“InitialSilenceTimeout”.

I restart again…and this continues on. I cycle through steps 4/5 above many times per second – me trying to start recognition and instantly getting onSpeechEndDetected/InitialSilenceTimeout. This can go on for 20 seconds, until I abandon the effort altogether.

A more detailed log is shown below.

So my questions are:

  1. Is this a bug? has it been seen before?
  2. Am I doing something wrong in my sequence above? I have been assuming that when I get a recognition event with ResultReason::RecognizedSpeech and no transcript that I must then restart recognition, as I am doing?
  3. Does this have to do with me not setting the SpeechServiceConnection_InitialSilenceTimeoutMs to any value?

Any other thoughts or guidance appreciated. For now, Azure speech reco is not working in my product 😦.

To Reproduce See above

Expected behavior I did not expect to immediately get a recognition result with no speech detected over and over, less than 1 sec after starting recognition.

Version of the Cognitive Services Speech SDK 1.22.0

Platform, Operating System, and Programming Language

  • OS: Debian 10
  • Hardware - x64
  • Programming language: C++
  • Browser [e.g. Chrome, Safari] (if applicable) - N/A

Additional context

Here is an example log file showing how frequently this is happening. This is from my C++ program

2022-08-19 09:09:04.969604 start transcribing
2022-08-19 09:09:05.529595 responseHandler event azure_transcribe::end_of_utterance, body (null).
2022-08-19 09:09:05.549589 responseHandler event azure_transcribe::no_speech_detected, body {"Id":"5d690f57895c421a8a03bc00224994cf","RecognitionStatus":"InitialSilenceTimeout","Offset":41400000,"Duration":8600000}.

2022-08-19 09:09:05.589592 start transcribing en-US
2022-08-19 09:09:05.649594 azure_transcribe::end_of_utterance, body (null).
2022-08-19 09:09:05.649594 responseHandler event azure_transcribe::no_speech_detected, body {"Id":"62a0ed7d63784e25b774b33da1af3332","RecognitionStatus":"InitialSilenceTimeout","Offset":0,"Duration":3000000}.

2022-08-19 09:09:05.729590 start transcribing
2022-08-19 09:09:05.809594 responseHandler event azure_transcribe::end_of_utterance, body (null).
2022-08-19 09:09:05.809594 responseHandler event azure_transcribe::no_speech_detected, body {"Id":"6c5dfd3a4b534e288bccdc8d6e74c7e9","RecognitionStatus":"InitialSilenceTimeout","Offset":0,"Duration":1000000}.

2022-08-19 09:09:05.869595 start transcribing
2022-08-19 09:09:05.949602 responseHandler event azure_transcribe::end_of_utterance, body (null).
2022-08-19 09:09:05.949602 responseHandler event azure_transcribe::no_speech_detected, body {"Id":"539ca4da416040318382680de1ebec2b","RecognitionStatus":"InitialSilenceTimeout","Offset":0,"Duration":1000000}.

2022-08-19 09:09:06.009591 start transcribing en-US complete
2022-08-19 09:09:06.089599 responseHandler event azure_transcribe::end_of_utterance, body (null).
2022-08-19 09:09:06.089599 responseHandler event azure_transcribe::no_speech_detected, body {"Id":"3e1559e84f7148b3986cfaca4472074a","RecognitionStatus":"InitialSilenceTimeout","Offset":0,"Duration":1000000}.

2022-08-19 09:09:06.129661 [mod_azure_transcribe.c:147 start transcribing 
2022-08-19 09:09:06.210747 responseHandler event azure_transcribe::end_of_utterance, body (null).
2022-08-19 09:09:06.210747 responseHandler event azure_transcribe::no_speech_detected, body {"Id":"5af4abdc5a104a3884ad0fcd6132fdaf","RecognitionStatus":"InitialSilenceTimeout","Offset":0,"Duration":1000000}.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
pankoponcommented, Aug 31, 2022

@davehorton When you use continuous recognition, you only need to end recognition

  • when the input stream/file ends (CancellationReason::EndOfStream),
  • based on a user’s choice (e.g. press a button on the app to stop)
  • or when there’s an error (CancellationReason::Error).

You do not need to (and should not) call StopContinuousRecognitionAsync after every Recognized event and start anew. With StartContinuousRecognitionAsync you start a recognition session. During a session there can be several Recognized events, with either ResultReason::RecognizedSpeech or ResultReason::NoMatch, meaning that you will receive recognition results one after another until the whole input (file) has been processed or recognition is stopped due to some other reason (ref. above). One Recognized event covers just one phrase (or max length of speech with no match, or a silence timeout period) from the input, and processing for more automatically continues.

Here’s an example with file input (= recognition will be stopped at the latest when the end of file is reached). If you want to use microphone input, remove AudioConfig and e.g. wait for a keypress after StartContinuousRecognitionAsync.

    auto config = SpeechConfig::FromSomething(...);

    auto audioInput = AudioConfig::FromWavFileInput("YourSpeechFile.wav");
    auto recognizer = SpeechRecognizer::FromConfig(config, audioInput);

    promise<void> recognitionEnd;

    recognizer->Recognizing.Connect([](const SpeechRecognitionEventArgs& e)
    {
        cout << "Recognizing:" << e.Result->Text << endl;
    });

    recognizer->Recognized.Connect([](const SpeechRecognitionEventArgs& e)
    {
        if (e.Result->Reason == ResultReason::RecognizedSpeech)
        {
            cout << "RECOGNIZED: Text=" << e.Result->Text << endl;
        }
        else if (e.Result->Reason == ResultReason::NoMatch)
        {
            cout << "NOMATCH: Reason=";
            auto reason = NoMatchDetails::FromResult(e.Result)->Reason;
            switch (reason)
            {
            case NoMatchReason::NotRecognized:
                cout << "NotRecognized";
                break;
            case NoMatchReason::InitialSilenceTimeout:
                cout << "SilenceTimeout";
                break;
            default:
                cout << "Other (" << (int)reason << ")";
                break;
            }
            cout << endl;
        }
    });

    recognizer->Canceled.Connect([](const SpeechRecognitionCanceledEventArgs& e)
    {
        switch (e.Reason)
        {
        case CancellationReason::EndOfStream:
            cout << "CANCELED: End of stream." << endl;
            break;

        case CancellationReason::Error:
            // NOTE: In case of an error, do not use the same recognizer for recognition anymore.
            cout << "CANCELED: ErrorCode=" << (int)e.ErrorCode << endl;
            cout << "CANCELED: ErrorDetails=" << e.ErrorDetails << endl;
            break;

        default:
            cout << "CANCELED: Other reason (" << (int)e.Reason << ")" << endl;
            break;
        }
    });

    recognizer->SessionStarted.Connect([](const SessionEventArgs& e)
    {
        cout << "Session started." << endl;
    });

    recognizer->SessionStopped.Connect([&recognitionEnd](const SessionEventArgs& e)
    {
        cout << "Session stopped." << endl;
        recognitionEnd.set_value();
    });

    recognizer->StartContinuousRecognitionAsync().get();
    recognitionEnd.get_future().get();
    recognizer->StopContinuousRecognitionAsync().get();
0reactions
pankoponcommented, Sep 1, 2022

Closed as answered, please open a new issue with detailed info if more support is needed.

Read more comments on GitHub >

github_iconTop Results From Across the Web

No results found

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found