Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Speech: Include the 'diarization_config' parameter in the RecognitionConfig object.

See original GitHub issue

Hello,

I am trying to include the SpeakerDiarizationConfig to the RecognitionConfig via the ‘diarization_config’ parameter but I am not being able to do that and I don’t see any example on the documentation page in order to make it work.

My approach looks as follow:

diarization_config = { "enableSpeakerDiarization": True, "minSpeakerCount": 2, "maxSpeakerCount": 3}

config = types.RecognitionConfig( encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16, sample_rate_hertz=frame_rate, language_code="es-ES", enable_word_time_offsets=True, diarization_config=diarization_config, enable_automatic_punctuation=True)

As far as I understand, the ‘diarization_config’ is supposed to be a SpeakerDiarizationConfig object but I don’t get how to use it properly.

My actual result is: “ValueError: Protocol message RecognitionConfig has no “diarization_config” field.” In contrast, my expected result is a transcript that includes the ‘speakerTag’ in the word list, like:

{ "startTime": "127.500s", "endTime": "127.700s", "word": "la", "speakerTag": 2 }, { "startTime": "127.700s", "endTime": "129.300s", "word": "direcci\u00f3n.", "speakerTag": 2 }

Thanks in advance for your kind help.

Issue Analytics

State:
Created 4 years ago
Comments:12 (7 by maintainers)

Top GitHub Comments

3reactions

tseavercommented, Sep 26, 2019

I’ve started the process to release google-cloud-speech 1.3.0 with that feature.

0reactions

tswastcommented, Oct 11, 2019

@kamrankausar Speaker diarization is only a beta feature now so you need to use the beta library.

client = speech_v1p1beta1.SpeechClient()

See the code sample at https://cloud.google.com/speech-to-text/docs/multiple-voices

Top Results From Across the Web

RecognitionConfig | Cloud Speech-to-Text Documentation

Provides information to the recognizer that specifies how to process the request. Encoding of audio data sent in all RecognitionAudio messages.

RecognitionConfig (Cloud Speech-to-Text API v1 (Rev. 119 ...

Provides information to the recognizer that specifies how to process the request. This is the Java data model class that specifies how to...

Class: Google::Apis::SpeechV1::RecognitionConfig

If true , the top result includes a list of words and the start and end time offsets ... #speech_contexts ⇒ Array<Google::Apis::SpeechV1::SpeechContext>.

Speech - Google APIs

The object takes the form of: { # The top-level message sent by the client for the ... Speech recognition will skip PhraseSets...

GoogleApi.Speech.V1.Model.RecognitionConfig ... - HexDocs

audioChannelCount (type: integer() , default: nil ) - The number of channels in the input audio data. ONLY set this for MULTI-CHANNEL recognition....