Loading XLSR53 Model for simple inference test
See original GitHub issueWhat is your question?
I am trying to create a script that executes a simple inference in an audio file using XLSR53 model. I am trying to adapt patrickvonplaten’s script that works for english-only wav2vec2.
Though, the checkpoints for wav2vec2 are in a .pt extension while the checkpoint for XLSR-53 uses a .pkl extension after having extracted the .zip (downloaded from here: https://dl.fbaipublicfiles.com/fairseq/wav2vec/xlsr_53_56k.pt <-- note that while the url uses .pt extension, the downloaded file is .zip).
I am unsure how to properly load the model as I am quite new to this, apologies if the mistake is obvious.
Code
import fairseq
import torch
from datasets import load_dataset
import soundfile as sf
import numpy as np
from itertools import groupby
libri_dummy = load_dataset(
"patrickvonplaten/librispeech_asr_dummy", "clean", split="validation")
def map_to_array(batch):
speech_array, _ = sf.read(batch["file"])
batch["speech"] = speech_array
return batch
class Decoder:
def __init__(self, json_dict):
self.dict = json_dict
self.look_up = np.asarray(list(self.dict.keys()))
def decode(self, ids):
converted_tokens = self.look_up[ids]
fused_tokens = [tok[0] for tok in groupby(converted_tokens)]
output = ' '.join(
''.join(''.join(fused_tokens).split("<s>")).split("|"))
return output
libri_dummy = libri_dummy.map(map_to_array, remove_columns=["file"])
input_sample = torch.tensor(libri_dummy[0]["speech"])[None, :]
model, cfg = fairseq.checkpoint_utils.load_model_ensemble(
['./xlsr_53_56k/archive/data.pkl'], arg_overrides={"data": 'dict'})
model = model[0]
model.eval()
logits = model(source=input_sample, padding_mask=None)["encoder_out"]
#logits = model(source=input_sample)["cpc_logits"]
# print(logits)
predicted_ids = torch.argmax(logits[:, 0], axis=-1)
json_dict = {"<s>": 0, "<pad>": 1, "</s>": 2, "<unk>": 3, "|": 4, "E": 5, "T": 6, "A": 7, "O": 8, "N": 9, "I": 10, "H": 11, "S": 12, "R": 13, "D": 14,
"L": 15, "U": 16, "M": 17, "W": 18, "C": 19, "F": 20, "G": 21, "Y": 22, "P": 23, "B": 24, "V": 25, "K": 26, "'": 27, "X": 28, "J": 29, "Q": 30, "Z": 31}
decoder = Decoder(json_dict=json_dict)
print("Prediction: ", decoder.decode(predicted_ids))
current error shows: line 37, in <module> model, cfg = fairseq.checkpoint_utils.load_model_ensemble( File “/home/seb/anaconda3/envs/aipytorch/lib/python3.8/site-packages/fairseq/checkpoint_utils.py”, line 297, in load_model_ensemble ensemble, args, _task = load_model_ensemble_and_task( File “/home/seb/anaconda3/envs/aipytorch/lib/python3.8/site-packages/fairseq/checkpoint_utils.py”, line 339, in load_model_ensemble_and_task state = load_checkpoint_to_cpu(filename, arg_overrides) File “/home/seb/anaconda3/envs/aipytorch/lib/python3.8/site-packages/fairseq/checkpoint_utils.py”, line 263, in load_checkpoint_to_cpu state = torch.load(f, map_location=torch.device(“cpu”)) File “/home/seb/anaconda3/envs/aipytorch/lib/python3.8/site-packages/torch/serialization.py”, line 595, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File “/home/seb/anaconda3/envs/aipytorch/lib/python3.8/site-packages/torch/serialization.py”, line 764, in _legacy_load magic_number = pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: A load persistent id instruction was encountered, but no persistent_load function was specified.
Any idea what I am doing wrong ? And any tips on what I should change in my code to make it work further along?
What’s your environment?
-
fairseq Version (e.g., 1.0 or master): Name: fairseq Version: 1.0.0a0+1e6323e
-
PyTorch Version (e.g., 1.0) 1.7.1
-
OS (e.g., Linux): linux
-
How you installed fairseq (
pip
, source): pip install fairseq -
Build command you used (if compiling from source):
-
Python version: 3.8.5
-
CUDA/cuDNN version:
-
GPU models and configuration:
-
Any other relevant information:
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:14 (1 by maintainers)
Top GitHub Comments
Yep, just checked the new HF integration. Nice. We are fine tuning for German though 😃 And add ina-foss to your tool list.
The best result that I’ve reached was to openup wav2vec_small.pt, i’ve also accomplished to pretrain, finetune and etc, with …in this case, mls_portuguese for my approach and let it running for 3 days in a single T4 gpu, wer reduced from 100 to 20~. but i’ve stopped training.
About XLSR, still unable to run in the same pipieline as wav2vec_small.pt, have no idea why.