Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Speech to text conversion seems to handle max 15~16 seconds of audio

See original GitHub issue

Please provide us with the following information:

This issue is for a: (mark with an `x`)

- [x] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

Select any audio wav file longer than 16 seconds. And tried converted files using both option 3 and option 6, but audio of upto 15~16 seconds is recognized and converted into text. The rest of audio is ignored. There is no error message.

Any log messages given by the failure

Expected/desired behavior

Entire audio file should be converted.

OS and Version?

Windows 7, 8 or 10. Linux (which distribution). Other. Windows 10 pro

Versions

Mention any other details that might be useful

The REST API does mention that it handles max of 15 seconds but longer audio is handled by the SDK. I am using SDK.

Thanks! We’ll be in touch soon.

Issue Analytics

State:
Created 5 years ago
Comments:20 (10 by maintainers)

Top GitHub Comments

3reactions

chlandsicommented, Jan 11, 2019

sdk for longer audio is in c# is it for python also

Here is some example code on how to use continuous recognition on audio files of arbitrary length:

def speech_recognize_continuous_from_file():
    """performs continuous speech recognition with input from an audio file"""
    speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
    audio_config = speechsdk.audio.AudioConfig(filename=audiofilename)
 
    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
 
    done = False
 
    def stop_cb(evt):
        """callback that stops continuous recognition upon receiving an event `evt`"""
        print('CLOSING on {}'.format(evt))
        speech_recognizer.stop_continuous_recognition()
        nonlocal done
        done = True
 
    # Connect callbacks to the events fired by the speech recognizer
    speech_recognizer.recognizing.connect(lambda evt: print('RECOGNIZING: {}'.format(evt)))
    speech_recognizer.recognized.connect(lambda evt: print('RECOGNIZED: {}'.format(evt)))
    speech_recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
    speech_recognizer.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
    speech_recognizer.canceled.connect(lambda evt: print('CANCELED {}'.format(evt)))
    # stop continuous recognition on either session stopped or canceled events
    speech_recognizer.session_stopped.connect(stop_cb)
    speech_recognizer.canceled.connect(stop_cb)
 
    # Start continuous speech recognition
    speech_recognizer.start_continuous_recognition()
    while not done:
        time.sleep(.5)

1reaction

wolfma61commented, Oct 2, 2018

Hey @lwluc - as Mark said, please open a new issue for new and unrelated questions in the future.

The sample is using RecognizeOnce … that limits recognition to 10-15 seconds see https://docs.microsoft.com/en-us/javascript/api/microsoft-cognitiveservices-speech-sdk/speechrecognizer?view=azure-node-latest "recognizeonceasync: recognizeOnceAsync: Starts speech recognition, and stops after the first utterance is recognized. "

for long running audio you will need to utilize Start/StopRecognize and subscribe to recognition events. https://docs.microsoft.com/en-us/javascript/api/microsoft-cognitiveservices-speech-sdk/speechrecognizer?view=azure-node-latest#startcontinuousrecognitionasync

thx Wolfgang