question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Loading XLSR53 Model for simple inference test

See original GitHub issue

What is your question?

I am trying to create a script that executes a simple inference in an audio file using XLSR53 model. I am trying to adapt patrickvonplaten’s script that works for english-only wav2vec2.

Though, the checkpoints for wav2vec2 are in a .pt extension while the checkpoint for XLSR-53 uses a .pkl extension after having extracted the .zip (downloaded from here: https://dl.fbaipublicfiles.com/fairseq/wav2vec/xlsr_53_56k.pt <-- note that while the url uses .pt extension, the downloaded file is .zip).

I am unsure how to properly load the model as I am quite new to this, apologies if the mistake is obvious.

Code

import fairseq
import torch
from datasets import load_dataset
import soundfile as sf
import numpy as np
from itertools import groupby


libri_dummy = load_dataset(
    "patrickvonplaten/librispeech_asr_dummy", "clean", split="validation")


def map_to_array(batch):
    speech_array, _ = sf.read(batch["file"])
    batch["speech"] = speech_array
    return batch


class Decoder:
    def __init__(self, json_dict):
        self.dict = json_dict
        self.look_up = np.asarray(list(self.dict.keys()))

    def decode(self, ids):
        converted_tokens = self.look_up[ids]
        fused_tokens = [tok[0] for tok in groupby(converted_tokens)]
        output = ' '.join(
            ''.join(''.join(fused_tokens).split("<s>")).split("|"))
        return output


libri_dummy = libri_dummy.map(map_to_array, remove_columns=["file"])


input_sample = torch.tensor(libri_dummy[0]["speech"])[None, :]

model, cfg = fairseq.checkpoint_utils.load_model_ensemble(
    ['./xlsr_53_56k/archive/data.pkl'], arg_overrides={"data": 'dict'})

model = model[0]
model.eval()

logits = model(source=input_sample, padding_mask=None)["encoder_out"]
#logits = model(source=input_sample)["cpc_logits"]
# print(logits)

predicted_ids = torch.argmax(logits[:, 0], axis=-1)
json_dict = {"<s>": 0, "<pad>": 1, "</s>": 2, "<unk>": 3, "|": 4, "E": 5, "T": 6, "A": 7, "O": 8, "N": 9, "I": 10, "H": 11, "S": 12, "R": 13, "D": 14,
             "L": 15, "U": 16, "M": 17, "W": 18, "C": 19, "F": 20, "G": 21, "Y": 22, "P": 23, "B": 24, "V": 25, "K": 26, "'": 27, "X": 28, "J": 29, "Q": 30, "Z": 31}

decoder = Decoder(json_dict=json_dict)
print("Prediction: ", decoder.decode(predicted_ids))

current error shows: line 37, in <module> model, cfg = fairseq.checkpoint_utils.load_model_ensemble( File “/home/seb/anaconda3/envs/aipytorch/lib/python3.8/site-packages/fairseq/checkpoint_utils.py”, line 297, in load_model_ensemble ensemble, args, _task = load_model_ensemble_and_task( File “/home/seb/anaconda3/envs/aipytorch/lib/python3.8/site-packages/fairseq/checkpoint_utils.py”, line 339, in load_model_ensemble_and_task state = load_checkpoint_to_cpu(filename, arg_overrides) File “/home/seb/anaconda3/envs/aipytorch/lib/python3.8/site-packages/fairseq/checkpoint_utils.py”, line 263, in load_checkpoint_to_cpu state = torch.load(f, map_location=torch.device(“cpu”)) File “/home/seb/anaconda3/envs/aipytorch/lib/python3.8/site-packages/torch/serialization.py”, line 595, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File “/home/seb/anaconda3/envs/aipytorch/lib/python3.8/site-packages/torch/serialization.py”, line 764, in _legacy_load magic_number = pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: A load persistent id instruction was encountered, but no persistent_load function was specified.

Any idea what I am doing wrong ? And any tips on what I should change in my code to make it work further along?

What’s your environment?

  • fairseq Version (e.g., 1.0 or master): Name: fairseq Version: 1.0.0a0+1e6323e

  • PyTorch Version (e.g., 1.0) 1.7.1

  • OS (e.g., Linux): linux

  • How you installed fairseq (pip, source): pip install fairseq

  • Build command you used (if compiling from source):

  • Python version: 3.8.5

  • CUDA/cuDNN version:

  • GPU models and configuration:

  • Any other relevant information:

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:1
  • Comments:14 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
olafthielecommented, Feb 9, 2021

Yep, just checked the new HF integration. Nice. We are fine tuning for German though 😃 And add ina-foss to your tool list.

1reaction
Ryojikncommented, Feb 9, 2021

The best result that I’ve reached was to openup wav2vec_small.pt, i’ve also accomplished to pretrain, finetune and etc, with …in this case, mls_portuguese for my approach and let it running for 3 days in a single T4 gpu, wer reduced from 100 to 20~. but i’ve stopped training.

About XLSR, still unable to run in the same pipieline as wav2vec_small.pt, have no idea why.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Loading XLSR53 Model for simple inference test · Issue #3199
What is your question? I am trying to create a script that executes a simple inference in an audio file using XLSR53 model....
Read more >
facebook/wav2vec2-xlsr-53-espeak-cv-ft · Hugging Face
This paper extends previous work on zero-shot cross-lingual transfer learning by fine-tuning a multilingually pretrained wav2vec 2.0 model to ...
Read more >
Fine-tune and deploy a Wav2Vec2 model for speech ...
First, we show how to load and preprocess the SUPERB dataset in a SageMaker environment in order to obtain a tokenizer and feature...
Read more >
E2E Inference over CommonVoice-DE data ($480) · Snippets
Wav2Vec2 became a popular pre-trained model for several ... HuggingFace transformers now offers an easy-to-use integration with Kensho ...
Read more >
Top 119 resources for wav2vec models - NLP Hub - Metatext
Upload dataset, train and deploy machine learning models in minutes. ➡️ Test For Free Now. Wav2vec Models Resources for Natural Language Processing Projects....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found