Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ASR pipeline does not work with openai/whisper on current master

See original GitHub issue

System Info

transformers @ git+https://github.com/huggingface/transformers.git@b651efe59ea506d38173e3a60a4228e7e74719f9 python 3.6 Standard AWS Ubuntu Deep Learning AMI (Ubuntu 18.04) Version 30.0

Who can help?

@Narsil @anton-l @sanchit-gandhi @patrickvonplaten

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, …)
My own task or dataset (give details below)

Reproduction

To reproduce run the following code, from asr pipeline example and whisper:

from datasets import load_dataset
from transformers import pipeline

pipe = pipeline(model="openai/whisper-large")
ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
output = pipe(ds[0]['file'], chunk_length_s=30, stride_length_s=(4, 2))

yields:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-13-efceed64cd5c> in <module>
----> 1 output = pipe(ds[0]['file'], chunk_length_s=30, stride_length_s=(4, 2))

~/venv38/lib/python3.8/site-packages/transformers/pipelines/automatic_speech_recognition.py in __call__(self, inputs, **kwargs)
    181                         `"".join(chunk["text"] for chunk in output["chunks"])`.
    182         """
--> 183         return super().__call__(inputs, **kwargs)
    184 
    185     def _sanitize_parameters(self, **kwargs):

~/venv38/lib/python3.8/site-packages/transformers/pipelines/base.py in __call__(self, inputs, num_workers, batch_size, *args, **kwargs)
   1072             return self.iterate(inputs, preprocess_params, forward_params, postprocess_params)
   1073         else:
-> 1074             return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
   1075 
   1076     def run_multi(self, inputs, preprocess_params, forward_params, postprocess_params):

~/venv38/lib/python3.8/site-packages/transformers/pipelines/base.py in run_single(self, inputs, preprocess_params, forward_params, postprocess_params)
   1093     def run_single(self, inputs, preprocess_params, forward_params, postprocess_params):
   1094         all_outputs = []
-> 1095         for model_inputs in self.preprocess(inputs, **preprocess_params):
   1096             model_outputs = self.forward(model_inputs, **forward_params)
   1097             all_outputs.append(model_outputs)

~/venv38/lib/python3.8/site-packages/transformers/pipelines/automatic_speech_recognition.py in preprocess(self, inputs, chunk_length_s, stride_length_s)
    260             # Currently chunking is not possible at this level for `seq2seq` so
    261             # it's ok.
--> 262             align_to = self.model.config.inputs_to_logits_ratio
    263             chunk_len = int(round(chunk_length_s * self.feature_extractor.sampling_rate / align_to) * align_to)
    264             stride_left = int(round(stride_length_s[0] * self.feature_extractor.sampling_rate / align_to) * align_to)

~/venv38/lib/python3.8/site-packages/transformers/configuration_utils.py in __getattribute__(self, key)
    252         if key != "attribute_map" and key in super().__getattribute__("attribute_map"):
    253             key = super().__getattribute__("attribute_map")[key]
--> 254         return super().__getattribute__(key)
    255 
    256     def __init__(self, **kwargs):

AttributeError: 'WhisperConfig' object has no attribute 'inputs_to_logits_ratio'

Expected behavior

I would’ve expected to obtain the transcript in output.

Issue Analytics

State:
Created a year ago
Comments:14 (11 by maintainers)

Top GitHub Comments

3reactions

ArthurZuckercommented, Oct 11, 2022

Really sorry about my miss-communication. The chunking that will be supported is different from CTC. Let’s organize a call to speak in more details about that 😉 The goal would be to be able to specify a chunk length and stride length (if people want to customize it) but default Whisper has its own parameters. Let’s talk more about that when we call 🤗

3reactions

Narsilcommented, Oct 11, 2022

@ArthurZucker are we sure Whisper can handle chunking ?

Whisper is not a CTC model meaning that chunking as shown in Nico’s blog does not work.

from internal conversation.

Happy to jump into a design call to discuss whether we can do it or not.

Not being CTC means it’s harder to handle the boundaries. Boundaries at silence are sort of OK, but unfortunately can never really a complete solution (because you can never be sure you’re going to get a silence, and you MUST be able to handle chunking regardless). This might be deemed acceptable in whisper btw, but when we checked for regular models, the regular silence detection was not good enough to be ran automatically (meaning you have to tune settings always to get decent silence results with most silence detectors)