Using Bleu for batch input
See original GitHub issueThe example provided in docs for Bleu is for Single input. In that case the output from the engine should be in this format
def evaluate_step():
...
predictions = "Predicted Sentence 1".split()
references = ["Reference Sentence 1".split(), "Reference Sentence 1.2".split()]
return (predictions, references)
When calculating Bleu score for a Batch what is the format for output from the engine? It should be something like
#For a batch of size 2
predictions = ["Predicted Sentence 1".split(), "Predicted Sentence 2".split()]
references = [["Reference Sentence 1.1".split(), "Reference Sentence 1.2".split()], ["Reference Sentence 2.1".split(), "Reference Sentence 2.2".split()]]
Doing this gives an error.
TypeError: unhashable type: 'list'
The typing for update
requires predictions to be a Sequence[Any]
and References to be Sequence[Sequence[Any]]]
. Does this mean the batch input is not possible?
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (6 by maintainers)
Top Results From Across the Web
Using Bleu for batch input - Pytorch/Ignite - Codesti
When calculating Bleu score for a Batch what is the format for output from the engine? It should be something like. #For a...
Read more >A Gentle Introduction to Calculating the BLEU Score for Text in ...
BLEU, or the Bilingual Evaluation Understudy, is a score for comparing a candidate translation of text to one or more reference translations.
Read more >How to calculate BLEU Score in Python? - DigitalOcean
1. Input and Split the sentences · 2. Calculate the BLEU score in Python · 3. Complete Code for Implementing BLEU Score in...
Read more >What exact inputs does bleu_metric.compute() require?
I want to use the BLEU metric from the nlp library, but I'm having pro… ... batch = tokenizer.prepare_translation_batch(src_texts=src_texts, ...
Read more >[BC-breaking] added Bleu calculation on batch input - GitHub
High-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently. - [BC-breaking] added Bleu calculation on ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
What you mentioned is correct only if the input is the whole corpus.
The corpus is split in batch, so we have to firstly accumulate the ngram counter, then finally compute the bp and mean when the corpus has been covered. It implies multiple update calls (accumulation) and one compute (bp + mean). In other words, the micro average you suggested is not correct.
And if you consider the distributed computing, it means each processor has a part of the corpus. Again, we accumulate, (sync) then bp + mean.
Firstly, the
_corpus_bleu
should be split in 2 parts : (1) the accumulation of ngrams counter (maybe the lenghts of cand/hyp too) and (2) the computation of bp, smoothing and geometric mean. The sentence bleu score (ie macro avg) is (1) + (2) for each sentence and (3) average. The corpus bleu score (ie micro avg) is (1) for each sentence then (2).For batch version, the macro avg means apply the sentence score to each sentence of the batch. We must add a loop over the batch, it is quite straighforward. The micro avg is natively fine with batch version (apply (1) on batch then (2)).
For DDP, the work is fine for the actual macro avg. Concerning the micro avg version, a synchronization is needed to sum the different counters and lenghts.
HTH