question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Seq2Seq Metrics QOL: Bleu, Rouge

See original GitHub issue

Putting all my QOL issues here, idt I will have time to propose fixes, but I didn’t want these to be lost, in case they are useful. I tried using rouge and bleu for the first time and wrote down everything I didn’t immediately understand:

What I tried

Rouge experience:


rouge = load_metric('rouge')
rouge.add_batch(['hi im sam'], ['im daniel']) # fails
rouge.add_batch(predictions=['hi im sam'], references=['im daniel']) # works
rouge.compute() # huge messy output, but reasonable. Not worth integrating b/c don't want to rewrite all the postprocessing.

BLEU experience:

bleu = load_metric('bleu')
bleu.add_batch(predictions=['hi im sam'], references=['im daniel'])
bleu.add_batch(predictions=[['hi im sam']], references=[['im daniel']])

bleu.add_batch(predictions=[['hi im sam']], references=[['im daniel']])

All of these raise ValueError: Got a string but expected a list instead: 'im daniel'

Doc Typo

This says dataset=load_metric(...) which seems wrong, will cause NameError

image

cc @lhoestq, feel free to ignore.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:7
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

8reactions
lhoestqcommented, Jan 28, 2021

Hi !

As described in the documentation for bleu:

Args:
    predictions: list of translations to score.
        Each translation should be tokenized into a list of tokens.
    references: list of lists of references for each translation.
        Each reference should be tokenized into a list of tokens.

Therefore you can use this metric this way:

from datasets import load_metric

predictions = [
    ["hello", "there", "general", "kenobi"],                             # tokenized prediction of the first sample
    ["foo", "bar", "foobar"]                                             # tokenized prediction of the second sample
]
references = [
    [["hello", "there", "general", "kenobi"], ["hello", "there", "!"]],  # tokenized references for the first sample (2 references)
    [["foo", "bar", "foobar"]]                                           # tokenized references for the second sample (1 reference)
]

bleu = load_metric("bleu")
bleu.compute(predictions=predictions, references=references)
# Or you can also add batches before calling compute()
# bleu.add_batch(predictions=predictions, references=references)
# bleu.compute()

Hope this helps 😃

5reactions
mrm8488commented, Nov 12, 2020

So what is the right way to add a batch to compute BLEU?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Seq2Seq Metrics QOL: Bleu, Rouge · Issue #107 - GitHub
I tried using rouge and bleu for the first time and wrote down everything I didn't immediately understand: Bleu expects tokenization, can I...
Read more >
deep learning approaches on image captioning: a review - arXiv
The caption evaluation metrics evaluate the captions generated by the models and have been designed specifically for the image captioning task.
Read more >
Neural Response Generation With Dynamic Vocabularies
In this paper, we refer the evaluation metrics used in [1] , and further add several metrics: BLEU: Since the questions and answers...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found