question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Speech-to-Text Container not working with python Speech SDK

See original GitHub issue

Describe the bug When running the Cognitive Services public preview container for running speech-to-text on premises, the python SDK is unable to recognize speech. The container runs fine and the Speech SDK appears to connect but provides no result and no additional logging in the container. When using the same code against the public cloud, the speech SDK behaves as expected.

To Reproduce Steps to reproduce the behavior:

  1. Start speech container (Script pasted below)
  2. Invoke speech sdk via python (script is copied from speech_recognize_continuous_from_file method in python examples)

Expected behavior The SDK returns incremental callbacks as speech is being recognized.

Version of the Cognitive Services Speech SDK 1.6.0

Platform, Operating System, and Programming Language

  • OS: Mac OS High Sierra 10.13.6
  • Hardware - 2.7 GHz Intel Core i7
  • Programming language: Python

Additional context

  • script used to start speech container:
docker run --rm -it -p 5000:5000  --memory 4g --cpus 4  \
containerpreview.azurecr.io/microsoft/cognitive-services-speech-to-text \
EULA=accept \
Billing=https://westus.api.cognitive.microsoft.com/sts/v1.0 \
ApiKey=**myapikey** \
Logging:Console:LogLevel:Default=Debug
  • script used to invoke the speech sdk
def speech_recognize_continuous_from_file():
    """performs continuous speech recognition with input from an audio file"""
    # <SpeechContinuousRecognitionWithFile>
    initial_silence_timeout_ms = 15 * 1e3
    template = "ws://localhost:5000/speech/recognition/dictation/cognitiveservices/v1?initialSilenceTimeoutMs={:d}"
    speech_config = speechsdk.SpeechConfig(subscription=speech_key,
                                           endpoint=template.format(int(initial_silence_timeout_ms)))

    audio_config = speechsdk.audio.AudioConfig(filename='mytestfile.wav')

    speech_recognizer = speechsdk.SpeechRecognizer(
        speech_config=speech_config, audio_config=audio_config)

    done = False

    def stop_cb(evt):
        """callback that stops continuous recognition upon receiving an event `evt`"""
        print('CLOSING on {}'.format(evt))
        speech_recognizer.stop_continuous_recognition()
        nonlocal done
        done = True

    # Connect callbacks to the events fired by the speech recognizer
    speech_recognizer.recognizing.connect(
        lambda evt: print('RECOGNIZING: {}'.format(evt)))
    speech_recognizer.recognized.connect(
        lambda evt: print('RECOGNIZED: {}'.format(evt)))
    speech_recognizer.session_started.connect(
        lambda evt: print('SESSION STARTED: {}'.format(evt)))
    speech_recognizer.session_stopped.connect(
        lambda evt: print('SESSION STOPPED {}'.format(evt)))
    speech_recognizer.canceled.connect(
        lambda evt: print('CANCELED {}'.format(evt)))
    # stop continuous recognition on either session stopped or canceled events
    speech_recognizer.session_stopped.connect(stop_cb)
    speech_recognizer.canceled.connect(stop_cb)

    # Start continuous speech recognition
    speech_recognizer.start_continuous_recognition()
    while not done:
        time.sleep(.5)
    # </SpeechContinuousRecognitionWithFile>
  • I made the requisite changes to the SpeechConfig object - passing in the endpoint of the local image and removing the region parameter. In my testing, I tried updating the local websocket URL to use SSL, which did in fact throw some errors in the speech container.
  • Using the python example for speech_recognize_continuous_from_file specifically, when running the sample it provides the following output: SESSION STARTED: SessionEventArgs(session_id=b78b00274ac149a0b43ffa647ed5ddc6) and nothing more. Furthermore, the container itself doesn’t log any more information when this script is invoked.
  • No activity is shown in my azure portal indicating any processing has been done

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:1
  • Comments:9 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
yshahincommented, Sep 3, 2019

Can you try this https://hub.docker.com/r/antsu/on-prem-client to hit the container. Use this command on your mac docker run --rm -it antsu/on-prem-client ./speech-to-text-client -r local --mac --expect "What's the weather like" ./audio/whatstheweatherlike.wav If this works then the issue is with the speech SDK on OSX

0reactions
paras55commented, Sep 26, 2020
1. The Speech SDK has a transport library dependency that doesn't support non-SSL connections

The SDK also doesn’t support http connection . I have been running a flask server and getting a http link and trying to hit api . But it doesn’t respond , Is there a fix ?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshoot the Speech SDK - Azure
This article provides information to help you solve issues you might encounter when you use the Speech SDK.
Read more >
speech recognition python code not working
The possible reason could be that the recognizer_instance.energy_threshold property is probably set to a value that is too high to start off ...
Read more >
The Ultimate Guide To Speech Recognition With Python
An in-depth tutorial on speech recognition with Python. Learn which speech recognition library gives the best results and build a full-featured "Guess The ......
Read more >
Troubleshooting | Cloud Speech-to-Text Documentation
Learn about troubleshooting steps that you might find helpful if you run into problems using Speech-to-Text. Cannot authenticate to Speech-to-Text.
Read more >
Getting started with Microsoft Speech-to-text
Getting started with Microsoft Speech-to-text. Introduction. Transcribe audio to text from a range of sources including microphone and audio files in more ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found