question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Obtaining output encodings for pretrained ASR models

See original GitHub issue

Hello!

For the pretrained ASR models is there an easy way of getting the tokens and tokenids of the groundtruth text? As far as I can tell, these encodings are obtained with spm_encode, but this requires the bpemodel, which is not packaged in the pretrained models:

$ tree 
.
|-- conf
|   |-- decode.yaml
|   `-- train_d6.yaml
|-- data
|   `-- train_trim_sp
|       `-- cmvn.ark
`-- exp
    |-- train_rnnlm_pytorch_lm_unigram500
    |   |-- model.json
    |   `-- rnnlm.model.best
    `-- train_trim_sp_pytorch_train_d6
        `-- results
            |-- model.json
            `-- model.last10.avg.best

I could try to reverse engineer the process based on the char_list_dict in the RNN-LM model, but I wonder if there is a cleaner way of achieving this.

Thank you!

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
danoneatacommented, Nov 15, 2019

@Fhrozen Thanks for the suggestion! That should work for now, but I agree with @sw005320 that it would be better to pack it with the model because (i) it is inconvenient to download a large dataset and (ii) it can be error prone to retrain, as it’s not always clear what parameters were used for a given pretrained model. I believe that having access to the BPE model can be useful in a number of scenarios (e.g., fine-tuning).

1reaction
sw005320commented, Nov 15, 2019

I think this is an important point. We should include the BPE model when we pack the model. I’ll ask people to do it.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Speech Encoder Decoder Models - Hugging Face
The SpeechEncoderDecoderModel can be used to initialize a speech-to-text model with any pretrained speech autoencoding model as the encoder (e.g. Wav2Vec2, ...
Read more >
Models — NVIDIA NeMo
This section gives a brief overview of the models that NeMo's ASR collection currently supports. Each of these models can be used with...
Read more >
speechbrain.pretrained.interfaces module - Read the Docs
A mixin for pretrained models that makes it possible to specify an encoding pipeline and a decoding pipeline. EncoderASR. A ready-to-use Encoder ASR...
Read more >
Stacked Acoustic-and-Textual Encoding: Integrating the Pre ...
In this way, it is straightforward to incorporate the pre-trained models into the system. Also, we develop an adaptor module to alleviate the ......
Read more >
explicit alignment of text and speech encodings - Research
... results rely on several modifications to the LAS architecture - including pretraining, ... several possible text outputs from the ASR model for...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found