Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Speech-to-Text Container not working with python Speech SDK

See original GitHub issue

Describe the bug When running the Cognitive Services public preview container for running speech-to-text on premises, the python SDK is unable to recognize speech. The container runs fine and the Speech SDK appears to connect but provides no result and no additional logging in the container. When using the same code against the public cloud, the speech SDK behaves as expected.

To Reproduce Steps to reproduce the behavior:

Start speech container (Script pasted below)
Invoke speech sdk via python (script is copied from speech_recognize_continuous_from_file method in python examples)

Expected behavior The SDK returns incremental callbacks as speech is being recognized.

Version of the Cognitive Services Speech SDK 1.6.0

Platform, Operating System, and Programming Language

OS: Mac OS High Sierra 10.13.6
Hardware - 2.7 GHz Intel Core i7
Programming language: Python

Additional context

script used to start speech container:

docker run --rm -it -p 5000:5000  --memory 4g --cpus 4  \
containerpreview.azurecr.io/microsoft/cognitive-services-speech-to-text \
EULA=accept \
Billing=https://westus.api.cognitive.microsoft.com/sts/v1.0 \
ApiKey=**myapikey** \
Logging:Console:LogLevel:Default=Debug

script used to invoke the speech sdk

def speech_recognize_continuous_from_file():
    """performs continuous speech recognition with input from an audio file"""
    # <SpeechContinuousRecognitionWithFile>
    initial_silence_timeout_ms = 15 * 1e3
    template = "ws://localhost:5000/speech/recognition/dictation/cognitiveservices/v1?initialSilenceTimeoutMs={:d}"
    speech_config = speechsdk.SpeechConfig(subscription=speech_key,
                                           endpoint=template.format(int(initial_silence_timeout_ms)))

    audio_config = speechsdk.audio.AudioConfig(filename='mytestfile.wav')

    speech_recognizer = speechsdk.SpeechRecognizer(
        speech_config=speech_config, audio_config=audio_config)

    done = False

    def stop_cb(evt):
        """callback that stops continuous recognition upon receiving an event `evt`"""
        print('CLOSING on {}'.format(evt))
        speech_recognizer.stop_continuous_recognition()
        nonlocal done
        done = True

    # Connect callbacks to the events fired by the speech recognizer
    speech_recognizer.recognizing.connect(
        lambda evt: print('RECOGNIZING: {}'.format(evt)))
    speech_recognizer.recognized.connect(
        lambda evt: print('RECOGNIZED: {}'.format(evt)))
    speech_recognizer.session_started.connect(
        lambda evt: print('SESSION STARTED: {}'.format(evt)))
    speech_recognizer.session_stopped.connect(
        lambda evt: print('SESSION STOPPED {}'.format(evt)))
    speech_recognizer.canceled.connect(
        lambda evt: print('CANCELED {}'.format(evt)))
    # stop continuous recognition on either session stopped or canceled events
    speech_recognizer.session_stopped.connect(stop_cb)
    speech_recognizer.canceled.connect(stop_cb)

    # Start continuous speech recognition
    speech_recognizer.start_continuous_recognition()
    while not done:
        time.sleep(.5)
    # </SpeechContinuousRecognitionWithFile>

I made the requisite changes to the SpeechConfig object - passing in the endpoint of the local image and removing the region parameter. In my testing, I tried updating the local websocket URL to use SSL, which did in fact throw some errors in the speech container.
Using the python example for speech_recognize_continuous_from_file specifically, when running the sample it provides the following output: SESSION STARTED: SessionEventArgs(session_id=b78b00274ac149a0b43ffa647ed5ddc6) and nothing more. Furthermore, the container itself doesn’t log any more information when this script is invoked.
No activity is shown in my azure portal indicating any processing has been done

Issue Analytics

State:
Created 4 years ago
Reactions:1
Comments:9 (2 by maintainers)

Top GitHub Comments

1reaction

yshahincommented, Sep 3, 2019

Can you try this https://hub.docker.com/r/antsu/on-prem-client to hit the container. Use this command on your mac docker run --rm -it antsu/on-prem-client ./speech-to-text-client -r local --mac --expect "What's the weather like" ./audio/whatstheweatherlike.wav If this works then the issue is with the speech SDK on OSX

0reactions

paras55commented, Sep 26, 2020

1. The Speech SDK has a transport library dependency that doesn't support non-SSL connections

The SDK also doesn’t support http connection . I have been running a flask server and getting a http link and trying to hit api . But it doesn’t respond , Is there a fix ?

Top Results From Across the Web

Troubleshoot the Speech SDK - Azure

This article provides information to help you solve issues you might encounter when you use the Speech SDK.

speech recognition python code not working

The possible reason could be that the recognizer_instance.energy_threshold property is probably set to a value that is too high to start off ...

The Ultimate Guide To Speech Recognition With Python

An in-depth tutorial on speech recognition with Python. Learn which speech recognition library gives the best results and build a full-featured "Guess The ......

Troubleshooting | Cloud Speech-to-Text Documentation

Learn about troubleshooting steps that you might find helpful if you run into problems using Speech-to-Text. Cannot authenticate to Speech-to-Text.

Getting started with Microsoft Speech-to-text

Getting started with Microsoft Speech-to-text. Introduction. Transcribe audio to text from a range of sources including microphone and audio files in more ...