Evaluating BART on CNN/DM : How to process dataset
See original GitHub issueFrom the README of BART for reproducing CNN/DM results :
Follow instructions here to download and process into data-files such that
test.source
andtest.target
has one line for each non-tokenized sample.
After following instructions, I don’t have files like test.source
and test.target
…
Instead, I have test.bin
, and chunked version of this file
(chunked/test_000.bin
~ chunked/test_011.bin
).
How can I process test.bin
into test.source
and test.target
?
Issue Analytics
- State:
- Created 4 years ago
- Comments:11 (3 by maintainers)
Top Results From Across the Web
Evaluating BART on CNN/DM : How to process dataset #1391
Follow instructions here to download and process into data-files such that test.source and test.target has one line for each non-tokenized ...
Read more >BARTSCORE: Evaluating Generated Text as Text Generation
Experimentally, we evaluate different variants of BARTSCORE from 7 perspectives on 16 datasets. BARTSCORE achieves the best performance in 16 of 22 test ......
Read more >BARTScore: Evaluating Generated Text as Text Generation
(2) BARTScore can better support evaluation of generated text from different perspectives (e.g., ... We use BART fine-tuned on CNNDM dataset Hermann et...
Read more >Text Summarization | Papers With Code
263 papers with code • 27 benchmarks • 66 datasets ... Trend, Dataset, Best Model, Paper, Code ... BARTScore: Evaluating Generated Text as...
Read more >BARTSCORE: Evaluating Generated Text as Text Generation
35th Conference on Neural Information Processing Systems (NeurIPS 2021), Sydney, ... We use BART fine-tuned on CNNDM dataset [20], which is.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Here’s a version for Python 3 if anyone is interested:
https://github.com/artmatsak/cnn-dailymail
There are many details, here is my code.
I fix the over lenght of train.bpe.source caused by ascii ‘0D’ in articles by split and join
I summarize several notes here :
code : https://gist.github.com/zhaoguangxiang/45bf39c528cf7fb7853bffba7fe57c7e