Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Loading XLSR53 Model for simple inference test

See original GitHub issue

What is your question?

I am trying to create a script that executes a simple inference in an audio file using XLSR53 model. I am trying to adapt patrickvonplaten’s script that works for english-only wav2vec2.

Though, the checkpoints for wav2vec2 are in a .pt extension while the checkpoint for XLSR-53 uses a .pkl extension after having extracted the .zip (downloaded from here: https://dl.fbaipublicfiles.com/fairseq/wav2vec/xlsr_53_56k.pt <-- note that while the url uses .pt extension, the downloaded file is .zip).

I am unsure how to properly load the model as I am quite new to this, apologies if the mistake is obvious.

Code

import fairseq
import torch
from datasets import load_dataset
import soundfile as sf
import numpy as np
from itertools import groupby


libri_dummy = load_dataset(
    "patrickvonplaten/librispeech_asr_dummy", "clean", split="validation")


def map_to_array(batch):
    speech_array, _ = sf.read(batch["file"])
    batch["speech"] = speech_array
    return batch


class Decoder:
    def __init__(self, json_dict):
        self.dict = json_dict
        self.look_up = np.asarray(list(self.dict.keys()))

    def decode(self, ids):
        converted_tokens = self.look_up[ids]
        fused_tokens = [tok[0] for tok in groupby(converted_tokens)]
        output = ' '.join(
            ''.join(''.join(fused_tokens).split("<s>")).split("|"))
        return output


libri_dummy = libri_dummy.map(map_to_array, remove_columns=["file"])


input_sample = torch.tensor(libri_dummy[0]["speech"])[None, :]

model, cfg = fairseq.checkpoint_utils.load_model_ensemble(
    ['./xlsr_53_56k/archive/data.pkl'], arg_overrides={"data": 'dict'})

model = model[0]
model.eval()

logits = model(source=input_sample, padding_mask=None)["encoder_out"]
#logits = model(source=input_sample)["cpc_logits"]
# print(logits)

predicted_ids = torch.argmax(logits[:, 0], axis=-1)
json_dict = {"<s>": 0, "<pad>": 1, "</s>": 2, "<unk>": 3, "|": 4, "E": 5, "T": 6, "A": 7, "O": 8, "N": 9, "I": 10, "H": 11, "S": 12, "R": 13, "D": 14,
             "L": 15, "U": 16, "M": 17, "W": 18, "C": 19, "F": 20, "G": 21, "Y": 22, "P": 23, "B": 24, "V": 25, "K": 26, "'": 27, "X": 28, "J": 29, "Q": 30, "Z": 31}

decoder = Decoder(json_dict=json_dict)
print("Prediction: ", decoder.decode(predicted_ids))

current error shows: line 37, in <module> model, cfg = fairseq.checkpoint_utils.load_model_ensemble( File “/home/seb/anaconda3/envs/aipytorch/lib/python3.8/site-packages/fairseq/checkpoint_utils.py”, line 297, in load_model_ensemble ensemble, args, _task = load_model_ensemble_and_task( File “/home/seb/anaconda3/envs/aipytorch/lib/python3.8/site-packages/fairseq/checkpoint_utils.py”, line 339, in load_model_ensemble_and_task state = load_checkpoint_to_cpu(filename, arg_overrides) File “/home/seb/anaconda3/envs/aipytorch/lib/python3.8/site-packages/fairseq/checkpoint_utils.py”, line 263, in load_checkpoint_to_cpu state = torch.load(f, map_location=torch.device(“cpu”)) File “/home/seb/anaconda3/envs/aipytorch/lib/python3.8/site-packages/torch/serialization.py”, line 595, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File “/home/seb/anaconda3/envs/aipytorch/lib/python3.8/site-packages/torch/serialization.py”, line 764, in _legacy_load magic_number = pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: A load persistent id instruction was encountered, but no persistent_load function was specified.

Any idea what I am doing wrong ? And any tips on what I should change in my code to make it work further along?

What’s your environment?

fairseq Version (e.g., 1.0 or master): Name: fairseq Version: 1.0.0a0+1e6323e
PyTorch Version (e.g., 1.0) 1.7.1
OS (e.g., Linux): linux
How you installed fairseq (pip, source): pip install fairseq
Build command you used (if compiling from source):
Python version: 3.8.5
CUDA/cuDNN version:
GPU models and configuration:
Any other relevant information:

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:14 (1 by maintainers)

Top GitHub Comments

1reaction

olafthielecommented, Feb 9, 2021

Yep, just checked the new HF integration. Nice. We are fine tuning for German though 😃 And add ina-foss to your tool list.

1reaction

Ryojikncommented, Feb 9, 2021

The best result that I’ve reached was to openup wav2vec_small.pt, i’ve also accomplished to pretrain, finetune and etc, with …in this case, mls_portuguese for my approach and let it running for 3 days in a single T4 gpu, wer reduced from 100 to 20~. but i’ve stopped training.

About XLSR, still unable to run in the same pipieline as wav2vec_small.pt, have no idea why.

Top Results From Across the Web

Loading XLSR53 Model for simple inference test · Issue #3199

What is your question? I am trying to create a script that executes a simple inference in an audio file using XLSR53 model....

facebook/wav2vec2-xlsr-53-espeak-cv-ft · Hugging Face

This paper extends previous work on zero-shot cross-lingual transfer learning by fine-tuning a multilingually pretrained wav2vec 2.0 model to ...

Fine-tune and deploy a Wav2Vec2 model for speech ...

First, we show how to load and preprocess the SUPERB dataset in a SageMaker environment in order to obtain a tokenizer and feature...

E2E Inference over CommonVoice-DE data ($480) · Snippets

Wav2Vec2 became a popular pre-trained model for several ... HuggingFace transformers now offers an easy-to-use integration with Kensho ...

Top 119 resources for wav2vec models - NLP Hub - Metatext

Upload dataset, train and deploy machine learning models in minutes. ➡️ Test For Free Now. Wav2vec Models Resources for Natural Language Processing Projects....