Seq2Seq Metrics QOL: Bleu, Rouge
See original GitHub issuePutting all my QOL issues here, idt I will have time to propose fixes, but I didn’t want these to be lost, in case they are useful. I tried using rouge
and bleu
for the first time and wrote down everything I didn’t immediately understand:
- Bleu expects tokenization, can I just kwarg it like sacrebleu?
- different signatures, means that I would have had to add a lot of conditionals + pre and post processing: if I were going to replace the
calculate_rouge
andcalculate_bleu
functions here: https://github.com/huggingface/transformers/blob/master/examples/seq2seq/utils.py#L61
What I tried
Rouge experience:
rouge = load_metric('rouge')
rouge.add_batch(['hi im sam'], ['im daniel']) # fails
rouge.add_batch(predictions=['hi im sam'], references=['im daniel']) # works
rouge.compute() # huge messy output, but reasonable. Not worth integrating b/c don't want to rewrite all the postprocessing.
BLEU experience:
bleu = load_metric('bleu')
bleu.add_batch(predictions=['hi im sam'], references=['im daniel'])
bleu.add_batch(predictions=[['hi im sam']], references=[['im daniel']])
bleu.add_batch(predictions=[['hi im sam']], references=[['im daniel']])
All of these raise ValueError: Got a string but expected a list instead: 'im daniel'
Doc Typo
This says dataset=load_metric(...)
which seems wrong, will cause NameError
cc @lhoestq, feel free to ignore.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:7
- Comments:5 (3 by maintainers)
Top Results From Across the Web
Seq2Seq Metrics QOL: Bleu, Rouge · Issue #107 - GitHub
I tried using rouge and bleu for the first time and wrote down everything I didn't immediately understand: Bleu expects tokenization, can I...
Read more >deep learning approaches on image captioning: a review - arXiv
The caption evaluation metrics evaluate the captions generated by the models and have been designed specifically for the image captioning task.
Read more >Neural Response Generation With Dynamic Vocabularies
In this paper, we refer the evaluation metrics used in [1] , and further add several metrics: BLEU: Since the questions and answers...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi !
As described in the documentation for
bleu
:Therefore you can use this metric this way:
Hope this helps 😃
So what is the right way to add a batch to compute BLEU?