question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Speech to text conversion seems to handle max 15~16 seconds of audio

See original GitHub issue

Please provide us with the following information:

This issue is for a: (mark with an x)

- [x] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

Select any audio wav file longer than 16 seconds. And tried converted files using both option 3 and option 6, but audio of upto 15~16 seconds is recognized and converted into text. The rest of audio is ignored. There is no error message.

Any log messages given by the failure

Expected/desired behavior

Entire audio file should be converted.

OS and Version?

Windows 7, 8 or 10. Linux (which distribution). Other. Windows 10 pro

Versions

Mention any other details that might be useful

The REST API does mention that it handles max of 15 seconds but longer audio is handled by the SDK. I am using SDK.


Thanks! We’ll be in touch soon.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:20 (10 by maintainers)

github_iconTop GitHub Comments

3reactions
chlandsicommented, Jan 11, 2019

sdk for longer audio is in c# is it for python also

Here is some example code on how to use continuous recognition on audio files of arbitrary length:

def speech_recognize_continuous_from_file():
    """performs continuous speech recognition with input from an audio file"""
    speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
    audio_config = speechsdk.audio.AudioConfig(filename=audiofilename)
 
    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
 
    done = False
 
    def stop_cb(evt):
        """callback that stops continuous recognition upon receiving an event `evt`"""
        print('CLOSING on {}'.format(evt))
        speech_recognizer.stop_continuous_recognition()
        nonlocal done
        done = True
 
    # Connect callbacks to the events fired by the speech recognizer
    speech_recognizer.recognizing.connect(lambda evt: print('RECOGNIZING: {}'.format(evt)))
    speech_recognizer.recognized.connect(lambda evt: print('RECOGNIZED: {}'.format(evt)))
    speech_recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
    speech_recognizer.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
    speech_recognizer.canceled.connect(lambda evt: print('CANCELED {}'.format(evt)))
    # stop continuous recognition on either session stopped or canceled events
    speech_recognizer.session_stopped.connect(stop_cb)
    speech_recognizer.canceled.connect(stop_cb)
 
    # Start continuous speech recognition
    speech_recognizer.start_continuous_recognition()
    while not done:
        time.sleep(.5)
1reaction
wolfma61commented, Oct 2, 2018

Hey @lwluc - as Mark said, please open a new issue for new and unrelated questions in the future.

The sample is using RecognizeOnce … that limits recognition to 10-15 seconds see https://docs.microsoft.com/en-us/javascript/api/microsoft-cognitiveservices-speech-sdk/speechrecognizer?view=azure-node-latest "recognizeonceasync: recognizeOnceAsync: Starts speech recognition, and stops after the first utterance is recognized. "

for long running audio you will need to utilize Start/StopRecognize and subscribe to recognition events. https://docs.microsoft.com/en-us/javascript/api/microsoft-cognitiveservices-speech-sdk/speechrecognizer?view=azure-node-latest#startcontinuousrecognitionasync

thx Wolfgang

Read more comments on GitHub >

github_iconTop Results From Across the Web

Optimize audio files for Speech-to-Text
Extract, transcode, and convert audio file properties using FFMPEG. Run Speech-to-Text on a variety of sample files that contain dialog.
Read more >
Google Speech to Text API not working for audio files ...
Google Speech to Text API not working for audio files longer than one minute ... This logic works well, but for some reason...
Read more >
Why Google Speech Recognition API only return first 2-3 ...
Usually I send 15 seconds or 30 seconds to Google Speech Recognition Service. It seems google stop recognizing further audio if a segment...
Read more >
Speech service quotas and limits - Azure
This section describes speech to text quotas and limits per Speech resource ... Max audio length for transcriptions with diarizaion enabled.
Read more >
Python | Speech recognition on large audio files
Speech recognition is the process of converting audio into text. This is commonly used in voice assistants like Alexa, Siri, etc.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found