Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Obtaining output encodings for pretrained ASR models

See original GitHub issue

Hello!

For the pretrained ASR models is there an easy way of getting the tokens and tokenids of the groundtruth text? As far as I can tell, these encodings are obtained with spm_encode, but this requires the bpemodel, which is not packaged in the pretrained models:

$ tree 
.
|-- conf
|   |-- decode.yaml
|   `-- train_d6.yaml
|-- data
|   `-- train_trim_sp
|       `-- cmvn.ark
`-- exp
    |-- train_rnnlm_pytorch_lm_unigram500
    |   |-- model.json
    |   `-- rnnlm.model.best
    `-- train_trim_sp_pytorch_train_d6
        `-- results
            |-- model.json
            `-- model.last10.avg.best

I could try to reverse engineer the process based on the char_list_dict in the RNN-LM model, but I wonder if there is a cleaner way of achieving this.

Thank you!

Issue Analytics

State:
Created 4 years ago
Comments:6 (4 by maintainers)

Top GitHub Comments

1reaction

danoneatacommented, Nov 15, 2019

@Fhrozen Thanks for the suggestion! That should work for now, but I agree with @sw005320 that it would be better to pack it with the model because (i) it is inconvenient to download a large dataset and (ii) it can be error prone to retrain, as it’s not always clear what parameters were used for a given pretrained model. I believe that having access to the BPE model can be useful in a number of scenarios (e.g., fine-tuning).

1reaction

sw005320commented, Nov 15, 2019

I think this is an important point. We should include the BPE model when we pack the model. I’ll ask people to do it.

Top Results From Across the Web

Speech Encoder Decoder Models - Hugging Face

The SpeechEncoderDecoderModel can be used to initialize a speech-to-text model with any pretrained speech autoencoding model as the encoder (e.g. Wav2Vec2, ...

Models — NVIDIA NeMo

This section gives a brief overview of the models that NeMo's ASR collection currently supports. Each of these models can be used with...

speechbrain.pretrained.interfaces module - Read the Docs

A mixin for pretrained models that makes it possible to specify an encoding pipeline and a decoding pipeline. EncoderASR. A ready-to-use Encoder ASR...

Stacked Acoustic-and-Textual Encoding: Integrating the Pre ...

In this way, it is straightforward to incorporate the pre-trained models into the system. Also, we develop an adaptor module to alleviate the ......

explicit alignment of text and speech encodings - Research

... results rely on several modifications to the LAS architecture - including pretraining, ... several possible text outputs from the ASR model for...