question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to run the LM to get the perplexity of indenpendant sentences

See original GitHub issue

Hi,

I’m trying to use the pretrained LM to score a set of independent sentences. I have written the sentences each on a separate line in my test_file.txt which I then binarize using the preprocess.py like this: python preprocess.py --only-source --testpref data-bin/test/test_file.txt --destdir data-bin/test/--srcdict models/wiki103_fconv_lm/dict.txt I then pass it through the trained LM: python eval_lm.py data-bin/test --path '../../fairseq/models/wiki103_fconv_lm/wiki103.pt' --output-word-probs

I notice that the sentences are seen as a consecutive piece of text and not as independent. For example, same sentence will get different scores the second time.

Could you please tell me how can I do to get the perplexities for parallel test data points?

Also, I would like to get conditional perplexities, for example of sentence 2 given sentence 1. I know the index where sentence 2 starts, called it i. I tried using the flag --output-word-probs and manually computed the perplexity taking into account only the words starting from index i. Is there an easier way to do this?

Cheers, Oana

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:1
  • Comments:5

github_iconTop GitHub Comments

1reaction
mhagiwaracommented, May 29, 2019

@OanaMariaCamburu

I notice that the sentences are seen as a consecutive piece of text and not as independent. For example, same sentence will get different scores the second time.

Use the --sample-break-mode eos option with eval_lm. This ensures that individual sentences are converted into independent samples.

0reactions
RakshaAgcommented, Nov 4, 2019

Could you help me with getting the perplexity of a single sentence for the pretrained language model?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Perplexity in Language Models - Towards Data Science
Perplexity is a useful metric to evaluate models in Natural Language Processing (NLP). This article will cover the two ways in which it...
Read more >
Perplexity of fixed-length models - Hugging Face
Running this with the stride length equal to the max input length is equivalent to the suboptimal, non-sliding-window strategy we discussed above. The...
Read more >
Perplexing Sentences - Colin Morris
Wow, it hits every word out of the park except the first. (The model ingests sentences independently, with no context on what came...
Read more >
How to calculate perplexity of a sentence using huggingface ...
However, when I try to use the code I get TypeError: forward() got an unexpected keyword argument 'masked_lm_labels' . I tried it with...
Read more >
Course Notes for COMS w4705: Language Modeling
A very common method is to evaluate the perplexity of the model on some held-out data. The method is as follows. Assume that...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found