How to score text with trained language model
See original GitHub issueI successfully trained a Transformer language model with fairseq. Now I would like to score text with this model.
This is what I am looking for:
echo "Input text to be scored by lm" | fairseq-score trained_model_path/checkpoint_best.pt
78.23 # example language model perplexity score for this sentence
Alternatively, something like
import torch
from fairseq.models.transformer_lm import TransformerLanguageModel
custom_lm = TransformerLanguageModel.from_pretrained('trained_model_path', 'checkpoint_best.pt')
custom_lm.score('Input text to be scored by lm')
# 78.23 # example language model perplexity score for this sentence
Looking here:
https://github.com/pytorch/fairseq/tree/master/examples/language_model
and here:
https://fairseq.readthedocs.io/en/latest/command_line_tools.html#fairseq-eval-lm
it seems that I have to binarize my test data with fairseq-preprocess
, which I want to avoid.
What is the easiest way to score plain text with a trained fairseq LM?
Issue Analytics
- State:
- Created 4 years ago
- Comments:7 (6 by maintainers)
Top Results From Across the Web
How to score text with trained language model #1259 - GitHub
What is the easiest way to score plain text with a trained fairseq LM?
Read more >Using Language Models to Create & Understand Text - Anyword
Here's a rundown of how you can understand and generate text using powerful language models.
Read more >Language modeling - Hugging Face
Causal language models are frequently used for text generation. This section shows you how to finetune DistilGPT2 to generate new text. Train.
Read more >How to Develop a Word-Level Neural Language Model and ...
First, the Tokenizer must be trained on the entire training dataset, which means it finds all of the unique words in the data...
Read more >Machine Learning — Text Classification, Language Modelling ...
A key feature of language modelling is that it is generative, meaning that it aims to predict the next word given a previous...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
My solution, without having much insight into torch and fairseq:
Then:
Added a
.score
function in 9d7725226da3fcd9c5d1ac02473289f53cd7dd78. It should be much faster than usinggenerate
.Usage: