Speech to text conversion seems to handle max 15~16 seconds of audio
See original GitHub issuePlease provide us with the following information:
This issue is for a: (mark with an x
)
- [x] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)
Minimal steps to reproduce
Select any audio wav file longer than 16 seconds. And tried converted files using both option 3 and option 6, but audio of upto 15~16 seconds is recognized and converted into text. The rest of audio is ignored. There is no error message.
Any log messages given by the failure
Expected/desired behavior
Entire audio file should be converted.
OS and Version?
Windows 7, 8 or 10. Linux (which distribution). Other. Windows 10 pro
Versions
Mention any other details that might be useful
The REST API does mention that it handles max of 15 seconds but longer audio is handled by the SDK. I am using SDK.
Thanks! We’ll be in touch soon.
Issue Analytics
- State:
- Created 5 years ago
- Comments:20 (10 by maintainers)
Top Results From Across the Web
Optimize audio files for Speech-to-Text
Extract, transcode, and convert audio file properties using FFMPEG. Run Speech-to-Text on a variety of sample files that contain dialog.
Read more >Google Speech to Text API not working for audio files ...
Google Speech to Text API not working for audio files longer than one minute ... This logic works well, but for some reason...
Read more >Why Google Speech Recognition API only return first 2-3 ...
Usually I send 15 seconds or 30 seconds to Google Speech Recognition Service. It seems google stop recognizing further audio if a segment...
Read more >Speech service quotas and limits - Azure
This section describes speech to text quotas and limits per Speech resource ... Max audio length for transcriptions with diarizaion enabled.
Read more >Python | Speech recognition on large audio files
Speech recognition is the process of converting audio into text. This is commonly used in voice assistants like Alexa, Siri, etc.
Read more >Top Related Medium Post
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Here is some example code on how to use continuous recognition on audio files of arbitrary length:
Hey @lwluc - as Mark said, please open a new issue for new and unrelated questions in the future.
The sample is using RecognizeOnce … that limits recognition to 10-15 seconds see https://docs.microsoft.com/en-us/javascript/api/microsoft-cognitiveservices-speech-sdk/speechrecognizer?view=azure-node-latest "recognizeonceasync: recognizeOnceAsync: Starts speech recognition, and stops after the first utterance is recognized. "
for long running audio you will need to utilize Start/StopRecognize and subscribe to recognition events. https://docs.microsoft.com/en-us/javascript/api/microsoft-cognitiveservices-speech-sdk/speechrecognizer?view=azure-node-latest#startcontinuousrecognitionasync
thx Wolfgang