Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Receiving a continous stream of ResultReason::NoMatch recognition results

See original GitHub issue

Describe the bug This happens frequently, but not on every call.

I start streaming recognition by calling m_recognizer->StartContinuousRecognitionAsync()
I immediately (same second) get my onSpeechEndDetected handler invoked (signaling end of speech)
I then immediately get a recognition result with “RecognitionStatus”:“InitialSilenceTimeout”.

Note: at this point, less than one second has passed since calling StartContinuousRecognitionAsync().
Also: I did not set PropertyId::SpeechServiceConnection_InitialSilenceTimeoutMs to any value

Since I did not get a result, I immediately restart recogition. To do that, I a) call m_recognizer->StopContinuousRecognitionAsync().get(); /* to stop the current recognition */ b) create a new m_recognizer and call m_recognizer->StartContinuousRecognitionAsync()
Again, I immediately get the same two results: a) onSpeechEndDetected b) recognition result with “RecognitionStatus”:“InitialSilenceTimeout”.

I restart again…and this continues on. I cycle through steps 4/5 above many times per second – me trying to start recognition and instantly getting onSpeechEndDetected/InitialSilenceTimeout. This can go on for 20 seconds, until I abandon the effort altogether.

A more detailed log is shown below.

So my questions are:

Is this a bug? has it been seen before?
Am I doing something wrong in my sequence above? I have been assuming that when I get a recognition event with ResultReason::RecognizedSpeech and no transcript that I must then restart recognition, as I am doing?
Does this have to do with me not setting the SpeechServiceConnection_InitialSilenceTimeoutMs to any value?

Any other thoughts or guidance appreciated. For now, Azure speech reco is not working in my product 😦.

To Reproduce See above

Expected behavior I did not expect to immediately get a recognition result with no speech detected over and over, less than 1 sec after starting recognition.

Version of the Cognitive Services Speech SDK 1.22.0

Platform, Operating System, and Programming Language

OS: Debian 10
Hardware - x64
Programming language: C++
Browser [e.g. Chrome, Safari] (if applicable) - N/A

Additional context

Here is an example log file showing how frequently this is happening. This is from my C++ program

2022-08-19 09:09:04.969604 start transcribing
2022-08-19 09:09:05.529595 responseHandler event azure_transcribe::end_of_utterance, body (null).
2022-08-19 09:09:05.549589 responseHandler event azure_transcribe::no_speech_detected, body {"Id":"5d690f57895c421a8a03bc00224994cf","RecognitionStatus":"InitialSilenceTimeout","Offset":41400000,"Duration":8600000}.

2022-08-19 09:09:05.589592 start transcribing en-US
2022-08-19 09:09:05.649594 azure_transcribe::end_of_utterance, body (null).
2022-08-19 09:09:05.649594 responseHandler event azure_transcribe::no_speech_detected, body {"Id":"62a0ed7d63784e25b774b33da1af3332","RecognitionStatus":"InitialSilenceTimeout","Offset":0,"Duration":3000000}.

2022-08-19 09:09:05.729590 start transcribing
2022-08-19 09:09:05.809594 responseHandler event azure_transcribe::end_of_utterance, body (null).
2022-08-19 09:09:05.809594 responseHandler event azure_transcribe::no_speech_detected, body {"Id":"6c5dfd3a4b534e288bccdc8d6e74c7e9","RecognitionStatus":"InitialSilenceTimeout","Offset":0,"Duration":1000000}.

2022-08-19 09:09:05.869595 start transcribing
2022-08-19 09:09:05.949602 responseHandler event azure_transcribe::end_of_utterance, body (null).
2022-08-19 09:09:05.949602 responseHandler event azure_transcribe::no_speech_detected, body {"Id":"539ca4da416040318382680de1ebec2b","RecognitionStatus":"InitialSilenceTimeout","Offset":0,"Duration":1000000}.

2022-08-19 09:09:06.009591 start transcribing en-US complete
2022-08-19 09:09:06.089599 responseHandler event azure_transcribe::end_of_utterance, body (null).
2022-08-19 09:09:06.089599 responseHandler event azure_transcribe::no_speech_detected, body {"Id":"3e1559e84f7148b3986cfaca4472074a","RecognitionStatus":"InitialSilenceTimeout","Offset":0,"Duration":1000000}.

2022-08-19 09:09:06.129661 [mod_azure_transcribe.c:147 start transcribing 
2022-08-19 09:09:06.210747 responseHandler event azure_transcribe::end_of_utterance, body (null).
2022-08-19 09:09:06.210747 responseHandler event azure_transcribe::no_speech_detected, body {"Id":"5af4abdc5a104a3884ad0fcd6132fdaf","RecognitionStatus":"InitialSilenceTimeout","Offset":0,"Duration":1000000}.

Issue Analytics

State:
Created a year ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

pankoponcommented, Aug 31, 2022

@davehorton When you use continuous recognition, you only need to end recognition

when the input stream/file ends (CancellationReason::EndOfStream),
based on a user’s choice (e.g. press a button on the app to stop)
or when there’s an error (CancellationReason::Error).

You do not need to (and should not) call StopContinuousRecognitionAsync after every Recognized event and start anew. With StartContinuousRecognitionAsync you start a recognition session. During a session there can be several Recognized events, with either ResultReason::RecognizedSpeech or ResultReason::NoMatch, meaning that you will receive recognition results one after another until the whole input (file) has been processed or recognition is stopped due to some other reason (ref. above). One Recognized event covers just one phrase (or max length of speech with no match, or a silence timeout period) from the input, and processing for more automatically continues.

Here’s an example with file input (= recognition will be stopped at the latest when the end of file is reached). If you want to use microphone input, remove AudioConfig and e.g. wait for a keypress after StartContinuousRecognitionAsync.

    auto config = SpeechConfig::FromSomething(...);

    auto audioInput = AudioConfig::FromWavFileInput("YourSpeechFile.wav");
    auto recognizer = SpeechRecognizer::FromConfig(config, audioInput);

    promise<void> recognitionEnd;

    recognizer->Recognizing.Connect([](const SpeechRecognitionEventArgs& e)
    {
        cout << "Recognizing:" << e.Result->Text << endl;
    });

    recognizer->Recognized.Connect([](const SpeechRecognitionEventArgs& e)
    {
        if (e.Result->Reason == ResultReason::RecognizedSpeech)
        {
            cout << "RECOGNIZED: Text=" << e.Result->Text << endl;
        }
        else if (e.Result->Reason == ResultReason::NoMatch)
        {
            cout << "NOMATCH: Reason=";
            auto reason = NoMatchDetails::FromResult(e.Result)->Reason;
            switch (reason)
            {
            case NoMatchReason::NotRecognized:
                cout << "NotRecognized";
                break;
            case NoMatchReason::InitialSilenceTimeout:
                cout << "SilenceTimeout";
                break;
            default:
                cout << "Other (" << (int)reason << ")";
                break;
            }
            cout << endl;
        }
    });

    recognizer->Canceled.Connect([](const SpeechRecognitionCanceledEventArgs& e)
    {
        switch (e.Reason)
        {
        case CancellationReason::EndOfStream:
            cout << "CANCELED: End of stream." << endl;
            break;

        case CancellationReason::Error:
            // NOTE: In case of an error, do not use the same recognizer for recognition anymore.
            cout << "CANCELED: ErrorCode=" << (int)e.ErrorCode << endl;
            cout << "CANCELED: ErrorDetails=" << e.ErrorDetails << endl;
            break;

        default:
            cout << "CANCELED: Other reason (" << (int)e.Reason << ")" << endl;
            break;
        }
    });

    recognizer->SessionStarted.Connect([](const SessionEventArgs& e)
    {
        cout << "Session started." << endl;
    });

    recognizer->SessionStopped.Connect([&recognitionEnd](const SessionEventArgs& e)
    {
        cout << "Session stopped." << endl;
        recognitionEnd.set_value();
    });

    recognizer->StartContinuousRecognitionAsync().get();
    recognitionEnd.get_future().get();
    recognizer->StopContinuousRecognitionAsync().get();

0reactions

pankoponcommented, Sep 1, 2022

Closed as answered, please open a new issue with detailed info if more support is needed.