Speech: Include the 'diarization_config' parameter in the RecognitionConfig object.
See original GitHub issueHello,
I am trying to include the SpeakerDiarizationConfig to the RecognitionConfig via the ‘diarization_config’ parameter but I am not being able to do that and I don’t see any example on the documentation page in order to make it work.
My approach looks as follow:
diarization_config = { "enableSpeakerDiarization": True, "minSpeakerCount": 2, "maxSpeakerCount": 3}
config = types.RecognitionConfig( encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16, sample_rate_hertz=frame_rate, language_code="es-ES", enable_word_time_offsets=True, diarization_config=diarization_config, enable_automatic_punctuation=True)
As far as I understand, the ‘diarization_config’ is supposed to be a SpeakerDiarizationConfig object but I don’t get how to use it properly.
My actual result is: “ValueError: Protocol message RecognitionConfig has no “diarization_config” field.” In contrast, my expected result is a transcript that includes the ‘speakerTag’ in the word list, like:
{ "startTime": "127.500s", "endTime": "127.700s", "word": "la", "speakerTag": 2 }, { "startTime": "127.700s", "endTime": "129.300s", "word": "direcci\u00f3n.", "speakerTag": 2 }
Thanks in advance for your kind help.
Issue Analytics
- State:
- Created 4 years ago
- Comments:12 (7 by maintainers)
I’ve started the process to release
google-cloud-speech 1.3.0
with that feature.@kamrankausar Speaker diarization is only a beta feature now so you need to use the beta library.
See the code sample at https://cloud.google.com/speech-to-text/docs/multiple-voices