Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Push to Hub fails with `model_name`

See original GitHub issue

System Info

transformers version: 4.25.0.dev0
Platform: Linux-5.15.0-48-generic-x86_64-with-glibc2.31
Python version: 3.9.13
Huggingface_hub version: 0.10.1
PyTorch version (GPU?): 1.13.0+cu117 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: yes
Using distributed or parallel set-up in script?: no

Who can help?

@sanchit-gandhi

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, …)
My own task or dataset (give details below)

Reproduction

from datasets import load_dataset, DatasetDict

common_voice = DatasetDict()

#common_voice["train"] = load_dataset("mozilla-foundation/common_voice_11_0", "sv-SE", split="train+validation", use_auth_token=True)
#common_voice["test"] = load_dataset("mozilla-foundation/common_voice_11_0", "sv-SE", split="test", use_auth_token=True)

common_voice["train"] = load_dataset("mozilla-foundation/common_voice_11_0", "sv-SE", split="train[:1%]+validation[:1%]", use_auth_token=True)
common_voice["test"] = load_dataset("mozilla-foundation/common_voice_11_0", "sv-SE", split="test[:1%]", use_auth_token=True)

print(common_voice)

common_voice = common_voice.remove_columns(["accent", "age", "client_id", "down_votes", "gender", "locale", "path", "segment", "up_votes"])

print(common_voice)

from transformers import WhisperFeatureExtractor

feature_extractor = WhisperFeatureExtractor.from_pretrained("openai/whisper-small")

from transformers import WhisperTokenizer

tokenizer = WhisperTokenizer.from_pretrained("openai/whisper-small", language="swedish", task="transcribe")

from transformers import WhisperProcessor

processor = WhisperProcessor.from_pretrained("openai/whisper-small", language="swedish", task="transcribe")

print(common_voice["train"][0])

from datasets import Audio

common_voice = common_voice.cast_column("audio", Audio(sampling_rate=16000))


print(common_voice["train"][0])

def prepare_dataset(batch):
    # load and resample audio data from 48 to 16kHz
    audio = batch["audio"]

    # compute log-Mel input features from input audio array 
    batch["input_features"] = feature_extractor(audio["array"], sampling_rate=audio["sampling_rate"]).input_features[0]

    # encode target text to label ids 
    batch["labels"] = tokenizer(batch["sentence"]).input_ids
    return batch

common_voice = common_voice.map(prepare_dataset, remove_columns=common_voice.column_names["train"], num_proc=1)

import torch

from dataclasses import dataclass
from typing import Any, Dict, List, Union

@dataclass
class DataCollatorSpeechSeq2SeqWithPadding:
    processor: Any

    def __call__(self, features: List[Dict[str, Union[List[int], torch.Tensor]]]) -> Dict[str, torch.Tensor]:
        # split inputs and labels since they have to be of different lengths and need different padding methods
        # first treat the audio inputs by simply returning torch tensors
        input_features = [{"input_features": feature["input_features"]} for feature in features]
        batch = self.processor.feature_extractor.pad(input_features, return_tensors="pt")

        # get the tokenized label sequences
        label_features = [{"input_ids": feature["labels"]} for feature in features]
        # pad the labels to max length
        labels_batch = self.processor.tokenizer.pad(label_features, return_tensors="pt")

        # replace padding with -100 to ignore loss correctly
        labels = labels_batch["input_ids"].masked_fill(labels_batch.attention_mask.ne(1), -100)

        # if bos token is appended in previous tokenization step,
        # cut bos token here as it's append later anyways
        if (labels[:, 0] == self.processor.tokenizer.bos_token_id).all().cpu().item():
            labels = labels[:, 1:]

        batch["labels"] = labels

        return batch

"""Let's initialise the data collator we've just defined:"""

data_collator = DataCollatorSpeechSeq2SeqWithPadding(processor=processor)

import evaluate

metric = evaluate.load("wer")

def compute_metrics(pred):
    pred_ids = pred.predictions
    label_ids = pred.label_ids

    # replace -100 with the pad_token_id
    label_ids[label_ids == -100] = tokenizer.pad_token_id

    # we do not want to group tokens when computing the metrics
    pred_str = tokenizer.batch_decode(pred_ids, skip_special_tokens=True)
    label_str = tokenizer.batch_decode(label_ids, skip_special_tokens=True)

    wer = 100 * metric.compute(predictions=pred_str, references=label_str)

    return {"wer": wer}

from transformers import WhisperForConditionalGeneration

model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-small")

model.config.forced_decoder_ids = None
model.config.suppress_tokens = []

from transformers import Seq2SeqTrainingArguments

training_args = Seq2SeqTrainingArguments(
    output_dir="./whisper-small-sv-test2",  # change to a repo name of your choice
    per_device_train_batch_size=16,
    gradient_accumulation_steps=1,  # increase by 2x for every 2x decrease in batch size
    learning_rate=1e-5,
    warmup_steps=500,
    max_steps=10,
    gradient_checkpointing=True,
    fp16=True,
    group_by_length=True,
    evaluation_strategy="steps",
    per_device_eval_batch_size=8,
    predict_with_generate=True,
    generation_max_length=225,
    save_steps=1000,
    eval_steps=1000,
    logging_steps=25,
    report_to=["tensorboard"],
    load_best_model_at_end=True,
    metric_for_best_model="wer",
    greater_is_better=False,
    push_to_hub=True,
)

from transformers import Seq2SeqTrainer

trainer = Seq2SeqTrainer(
    args=training_args,
    model=model,
    train_dataset=common_voice["train"],
    eval_dataset=common_voice["test"],
    data_collator=data_collator,
    compute_metrics=compute_metrics,
    tokenizer=processor.feature_extractor,
)

trainer.train()

"""Our best WER is 32.0% - not bad for 8h of training data! We can submit our checkpoint to the [`hf-speech-bench`](https://huggingface.co/spaces/huggingface/hf-speech-bench) on push by setting the appropriate key-word arguments (kwargs):"""

kwargs = {
    "dataset_tags": "mozilla-foundation/common_voice_11_0",
    "dataset": "Common Voice 11.0",  # a 'pretty' name for the training dataset
    "language": "sv",
    #"model_name": "WhisperSmallSwedishBirgerMoell",  # a 'pretty' name for our model
    "finetuned_from": "openai/whisper-small",
    "tasks": "automatic-speech-recognition",
    "tags": "hf-asr-leaderboard",
}

trainer.push_to_hub(**kwargs)

from transformers import pipeline
import gradio as gr

pipe = pipeline(model="birgermoell/whisper-small-sv-test2")  # change to "your-username/the-name-you-picked"

def transcribe(audio):
    text = pipe(audio)["text"]
    return text

iface = gr.Interface(
    fn=transcribe, 
    inputs=gr.Audio(source="microphone", type="filepath"), 
    outputs="text",
    title="Whisper Small SV",
    description="Realtime demo for Swedish speech recognition using a fine-tuned Whisper small model.",
)

iface.launch()

Expected behavior

The following script is a downloaded version of the colab notebook that follows the whisper fine-tuning tutorial. https://colab.research.google.com/github/sanchit-gandhi/notebooks/blob/main/fine_tune_whisper.ipynb

One edit was that I removed the model name since I had an issue that it was complaining about two model names that made it impossible to upload. The script just runs on 1% of the dataset on 10 epochs.

kwargs = { “dataset_tags”: “mozilla-foundation/common_voice_11_0”, “dataset”: “Common Voice 11.0”, # a ‘pretty’ name for the training dataset “language”: “sv”, #“model_name”: “WhisperSmallSwedishBirgerMoell”, # a ‘pretty’ name for our model “finetuned_from”: “openai/whisper-small”, “tasks”: “automatic-speech-recognition”, “tags”: “hf-asr-leaderboard”, }

https://huggingface.co/birgermoell/whisper-small-sv-test2

I also ran into similar issues when I trained a model on the whole dataset.

https://huggingface.co/birgermoell/whisper-small-sv

Issue Analytics

State:
Created a year ago
Comments:16 (11 by maintainers)

Top GitHub Comments

1reaction

sanchit-gandhicommented, Nov 7, 2022

Absolutely! Expect a PR tomorrow!

1reaction

sguggercommented, Nov 7, 2022

Ah, indeed would be nice if the processors had that attribute!

Top Results From Across the Web

Problem with push_to_hub - Transformers

Hello everyone. I'm trying to upload my fine-tuned GPT-2 Model to Model Hub. When I try to use the uploading function push_to_hub I...

[Trainer] Fix model name in push_to_hub #20064 - GitHub

Trainer's push_to_hub fails if model_name is specified in the kwargs. The variable model_name is explicitly defined in the push_to_hub method: transformers/src/ ...

SentenceTransformer — Sentence-Transformers documentation

Uploads all elements of this Sentence Transformer to a new HuggingFace Hub repository. Parameters. repo_name – Repository name for your model in the...

Pushing docker image to new docker hub account fails

I am attempting to push to a private repository on docker hub but I've double checked that my auth matches up. Why is...

Wi-Fi connection is lost after router or network settings change

Select your network, and then enter the password. Note: If the Wireless Setup Wizard is not listed, select Wi-Fi Protected Setup or WPS,...