Speech-to-Text Container not working with python Speech SDK
See original GitHub issueDescribe the bug When running the Cognitive Services public preview container for running speech-to-text on premises, the python SDK is unable to recognize speech. The container runs fine and the Speech SDK appears to connect but provides no result and no additional logging in the container. When using the same code against the public cloud, the speech SDK behaves as expected.
To Reproduce Steps to reproduce the behavior:
- Start speech container (Script pasted below)
- Invoke speech sdk via python (script is copied from speech_recognize_continuous_from_file method in python examples)
Expected behavior The SDK returns incremental callbacks as speech is being recognized.
Version of the Cognitive Services Speech SDK 1.6.0
Platform, Operating System, and Programming Language
- OS: Mac OS High Sierra 10.13.6
- Hardware - 2.7 GHz Intel Core i7
- Programming language: Python
Additional context
- script used to start speech container:
docker run --rm -it -p 5000:5000 --memory 4g --cpus 4 \
containerpreview.azurecr.io/microsoft/cognitive-services-speech-to-text \
EULA=accept \
Billing=https://westus.api.cognitive.microsoft.com/sts/v1.0 \
ApiKey=**myapikey** \
Logging:Console:LogLevel:Default=Debug
- script used to invoke the speech sdk
def speech_recognize_continuous_from_file():
"""performs continuous speech recognition with input from an audio file"""
# <SpeechContinuousRecognitionWithFile>
initial_silence_timeout_ms = 15 * 1e3
template = "ws://localhost:5000/speech/recognition/dictation/cognitiveservices/v1?initialSilenceTimeoutMs={:d}"
speech_config = speechsdk.SpeechConfig(subscription=speech_key,
endpoint=template.format(int(initial_silence_timeout_ms)))
audio_config = speechsdk.audio.AudioConfig(filename='mytestfile.wav')
speech_recognizer = speechsdk.SpeechRecognizer(
speech_config=speech_config, audio_config=audio_config)
done = False
def stop_cb(evt):
"""callback that stops continuous recognition upon receiving an event `evt`"""
print('CLOSING on {}'.format(evt))
speech_recognizer.stop_continuous_recognition()
nonlocal done
done = True
# Connect callbacks to the events fired by the speech recognizer
speech_recognizer.recognizing.connect(
lambda evt: print('RECOGNIZING: {}'.format(evt)))
speech_recognizer.recognized.connect(
lambda evt: print('RECOGNIZED: {}'.format(evt)))
speech_recognizer.session_started.connect(
lambda evt: print('SESSION STARTED: {}'.format(evt)))
speech_recognizer.session_stopped.connect(
lambda evt: print('SESSION STOPPED {}'.format(evt)))
speech_recognizer.canceled.connect(
lambda evt: print('CANCELED {}'.format(evt)))
# stop continuous recognition on either session stopped or canceled events
speech_recognizer.session_stopped.connect(stop_cb)
speech_recognizer.canceled.connect(stop_cb)
# Start continuous speech recognition
speech_recognizer.start_continuous_recognition()
while not done:
time.sleep(.5)
# </SpeechContinuousRecognitionWithFile>
- I made the requisite changes to the SpeechConfig object - passing in the endpoint of the local image and removing the region parameter. In my testing, I tried updating the local websocket URL to use SSL, which did in fact throw some errors in the speech container.
- Using the python example for speech_recognize_continuous_from_file specifically, when running the sample it provides the following output:
SESSION STARTED: SessionEventArgs(session_id=b78b00274ac149a0b43ffa647ed5ddc6)
and nothing more. Furthermore, the container itself doesn’t log any more information when this script is invoked. - No activity is shown in my azure portal indicating any processing has been done
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:9 (2 by maintainers)
Top Results From Across the Web
Troubleshoot the Speech SDK - Azure
This article provides information to help you solve issues you might encounter when you use the Speech SDK.
Read more >speech recognition python code not working
The possible reason could be that the recognizer_instance.energy_threshold property is probably set to a value that is too high to start off ...
Read more >The Ultimate Guide To Speech Recognition With Python
An in-depth tutorial on speech recognition with Python. Learn which speech recognition library gives the best results and build a full-featured "Guess The ......
Read more >Troubleshooting | Cloud Speech-to-Text Documentation
Learn about troubleshooting steps that you might find helpful if you run into problems using Speech-to-Text. Cannot authenticate to Speech-to-Text.
Read more >Getting started with Microsoft Speech-to-text
Getting started with Microsoft Speech-to-text. Introduction. Transcribe audio to text from a range of sources including microphone and audio files in more ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Can you try this https://hub.docker.com/r/antsu/on-prem-client to hit the container. Use this command on your mac
docker run --rm -it antsu/on-prem-client ./speech-to-text-client -r local --mac --expect "What's the weather like" ./audio/whatstheweatherlike.wav
If this works then the issue is with the speech SDK on OSXThe SDK also doesn’t support http connection . I have been running a flask server and getting a http link and trying to hit api . But it doesn’t respond , Is there a fix ?