question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Using Bleu for batch input

See original GitHub issue

The example provided in docs for Bleu is for Single input. In that case the output from the engine should be in this format


def evaluate_step():
  ...
  predictions = "Predicted Sentence 1".split()
  references = ["Reference Sentence 1".split(), "Reference Sentence 1.2".split()]
  return (predictions, references)

When calculating Bleu score for a Batch what is the format for output from the engine? It should be something like

#For a batch of size 2
predictions = ["Predicted Sentence 1".split(), "Predicted Sentence 2".split()]
references = [["Reference Sentence 1.1".split(), "Reference Sentence 1.2".split()], ["Reference Sentence 2.1".split(), "Reference Sentence 2.2".split()]]

Doing this gives an error.

TypeError: unhashable type: 'list'

The typing for update requires predictions to be a Sequence[Any] and References to be Sequence[Sequence[Any]]]. Does this mean the batch input is not possible?

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:7 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
sdesroziscommented, Aug 25, 2021

What you mentioned is correct only if the input is the whole corpus.

The corpus is split in batch, so we have to firstly accumulate the ngram counter, then finally compute the bp and mean when the corpus has been covered. It implies multiple update calls (accumulation) and one compute (bp + mean). In other words, the micro average you suggested is not correct.

And if you consider the distributed computing, it means each processor has a part of the corpus. Again, we accumulate, (sync) then bp + mean.

1reaction
sdesroziscommented, Aug 25, 2021

Firstly, the _corpus_bleu should be split in 2 parts : (1) the accumulation of ngrams counter (maybe the lenghts of cand/hyp too) and (2) the computation of bp, smoothing and geometric mean. The sentence bleu score (ie macro avg) is (1) + (2) for each sentence and (3) average. The corpus bleu score (ie micro avg) is (1) for each sentence then (2).

For batch version, the macro avg means apply the sentence score to each sentence of the batch. We must add a loop over the batch, it is quite straighforward. The micro avg is natively fine with batch version (apply (1) on batch then (2)).

For DDP, the work is fine for the actual macro avg. Concerning the micro avg version, a synchronization is needed to sum the different counters and lenghts.

HTH

Read more comments on GitHub >

github_iconTop Results From Across the Web

Using Bleu for batch input - Pytorch/Ignite - Codesti
When calculating Bleu score for a Batch what is the format for output from the engine? It should be something like. #For a...
Read more >
A Gentle Introduction to Calculating the BLEU Score for Text in ...
BLEU, or the Bilingual Evaluation Understudy, is a score for comparing a candidate translation of text to one or more reference translations.
Read more >
How to calculate BLEU Score in Python? - DigitalOcean
1. Input and Split the sentences · 2. Calculate the BLEU score in Python · 3. Complete Code for Implementing BLEU Score in...
Read more >
What exact inputs does bleu_metric.compute() require?
I want to use the BLEU metric from the nlp library, but I'm having pro… ... batch = tokenizer.prepare_translation_batch(src_texts=src_texts, ...
Read more >
[BC-breaking] added Bleu calculation on batch input - GitHub
High-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently. - [BC-breaking] added Bleu calculation on ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found