How to run the LM to get the perplexity of indenpendant sentences
See original GitHub issueHi,
I’m trying to use the pretrained LM to score a set of independent sentences. I have written the sentences each on a separate line in my test_file.txt which I then binarize using the preprocess.py like this:
python preprocess.py --only-source --testpref data-bin/test/test_file.txt --destdir data-bin/test/--srcdict models/wiki103_fconv_lm/dict.txt
I then pass it through the trained LM:
python eval_lm.py data-bin/test --path '../../fairseq/models/wiki103_fconv_lm/wiki103.pt' --output-word-probs
I notice that the sentences are seen as a consecutive piece of text and not as independent. For example, same sentence will get different scores the second time.
Could you please tell me how can I do to get the perplexities for parallel test data points?
Also, I would like to get conditional perplexities, for example of sentence 2 given sentence 1. I know the index where sentence 2 starts, called it i. I tried using the flag --output-word-probs and manually computed the perplexity taking into account only the words starting from index i. Is there an easier way to do this?
Cheers, Oana
Issue Analytics
- State:
- Created 5 years ago
- Reactions:1
- Comments:5
Top GitHub Comments
@OanaMariaCamburu
Use the
--sample-break-mode eos
option witheval_lm
. This ensures that individual sentences are converted into independent samples.Could you help me with getting the perplexity of a single sentence for the pretrained language model?