Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to run the LM to get the perplexity of indenpendant sentences

See original GitHub issue

Hi,

I’m trying to use the pretrained LM to score a set of independent sentences. I have written the sentences each on a separate line in my test_file.txt which I then binarize using the preprocess.py like this: python preprocess.py --only-source --testpref data-bin/test/test_file.txt --destdir data-bin/test/--srcdict models/wiki103_fconv_lm/dict.txt I then pass it through the trained LM: python eval_lm.py data-bin/test --path '../../fairseq/models/wiki103_fconv_lm/wiki103.pt' --output-word-probs

I notice that the sentences are seen as a consecutive piece of text and not as independent. For example, same sentence will get different scores the second time.

Could you please tell me how can I do to get the perplexities for parallel test data points?

Also, I would like to get conditional perplexities, for example of sentence 2 given sentence 1. I know the index where sentence 2 starts, called it i. I tried using the flag --output-word-probs and manually computed the perplexity taking into account only the words starting from index i. Is there an easier way to do this?

Cheers, Oana

Issue Analytics

State:
Created 5 years ago
Reactions:1
Comments:5

Top GitHub Comments

1reaction

mhagiwaracommented, May 29, 2019

@OanaMariaCamburu

I notice that the sentences are seen as a consecutive piece of text and not as independent. For example, same sentence will get different scores the second time.

Use the --sample-break-mode eos option with eval_lm. This ensures that individual sentences are converted into independent samples.

0reactions

RakshaAgcommented, Nov 4, 2019

Could you help me with getting the perplexity of a single sentence for the pretrained language model?

Top Results From Across the Web

Perplexity in Language Models - Towards Data Science

Perplexity is a useful metric to evaluate models in Natural Language Processing (NLP). This article will cover the two ways in which it...

Perplexity of fixed-length models - Hugging Face

Running this with the stride length equal to the max input length is equivalent to the suboptimal, non-sliding-window strategy we discussed above. The...

Perplexing Sentences - Colin Morris

Wow, it hits every word out of the park except the first. (The model ingests sentences independently, with no context on what came...

How to calculate perplexity of a sentence using huggingface ...

However, when I try to use the code I get TypeError: forward() got an unexpected keyword argument 'masked_lm_labels' . I tried it with...

Course Notes for COMS w4705: Language Modeling

A very common method is to evaluate the perplexity of the model on some held-out data. The method is as follows. Assume that...