question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Speech: Include the 'diarization_config' parameter in the RecognitionConfig object.

See original GitHub issue

Hello,

I am trying to include the SpeakerDiarizationConfig to the RecognitionConfig via the ‘diarization_config’ parameter but I am not being able to do that and I don’t see any example on the documentation page in order to make it work.

My approach looks as follow:

diarization_config = { "enableSpeakerDiarization": True, "minSpeakerCount": 2, "maxSpeakerCount": 3}

config = types.RecognitionConfig( encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16, sample_rate_hertz=frame_rate, language_code="es-ES", enable_word_time_offsets=True, diarization_config=diarization_config, enable_automatic_punctuation=True)

As far as I understand, the ‘diarization_config’ is supposed to be a SpeakerDiarizationConfig object but I don’t get how to use it properly.

My actual result is: “ValueError: Protocol message RecognitionConfig has no “diarization_config” field.” In contrast, my expected result is a transcript that includes the ‘speakerTag’ in the word list, like:

{ "startTime": "127.500s", "endTime": "127.700s", "word": "la", "speakerTag": 2 }, { "startTime": "127.700s", "endTime": "129.300s", "word": "direcci\u00f3n.", "speakerTag": 2 }

Thanks in advance for your kind help.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:12 (7 by maintainers)

github_iconTop GitHub Comments

0reactions
tswastcommented, Oct 11, 2019

@kamrankausar Speaker diarization is only a beta feature now so you need to use the beta library.

client = speech_v1p1beta1.SpeechClient()

See the code sample at https://cloud.google.com/speech-to-text/docs/multiple-voices

Read more comments on GitHub >

github_iconTop Results From Across the Web

RecognitionConfig | Cloud Speech-to-Text Documentation
Provides information to the recognizer that specifies how to process the request. Encoding of audio data sent in all RecognitionAudio messages.
Read more >
RecognitionConfig (Cloud Speech-to-Text API v1 (Rev. 119 ...
Provides information to the recognizer that specifies how to process the request. This is the Java data model class that specifies how to...
Read more >
Class: Google::Apis::SpeechV1::RecognitionConfig
If true , the top result includes a list of words and the start and end time offsets ... #speech_contexts ⇒ Array<Google::Apis::SpeechV1::SpeechContext>.
Read more >
Speech - Google APIs
The object takes the form of: { # The top-level message sent by the client for the ... Speech recognition will skip PhraseSets...
Read more >
GoogleApi.Speech.V1.Model.RecognitionConfig ... - HexDocs
audioChannelCount (type: integer() , default: nil ) - The number of channels in the input audio data. ONLY set this for MULTI-CHANNEL recognition....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found