Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Evaluating BART on CNN/DM : How to process dataset

See original GitHub issue

From the README of BART for reproducing CNN/DM results :

Follow instructions here to download and process into data-files such that test.source and test.target has one line for each non-tokenized sample.

After following instructions, I don’t have files like test.source and test.target…

Instead, I have test.bin, and chunked version of this file
(chunked/test_000.bin ~ chunked/test_011.bin).

How can I process test.bin into test.source and test.target ?

@ngoyal2707 @yinhanliu

Issue Analytics

State:
Created 4 years ago
Comments:11 (3 by maintainers)

Top GitHub Comments

9reactions

artmatsakcommented, Jan 27, 2020

Here’s a version for Python 3 if anyone is interested:

https://github.com/artmatsak/cnn-dailymail

9reactions

zhaoguangxiangcommented, Dec 6, 2019

There are many details, here is my code.

I fix the over lenght of train.bpe.source caused by ascii ‘0D’ in articles by split and join

I summarize several notes here :

remove " " before “.”
cased, remove the line of lower cased
“\r” in origin articles leads error in bpe preprocess
remove “(CNN)”
bpe encoding

code : https://gist.github.com/zhaoguangxiang/45bf39c528cf7fb7853bffba7fe57c7e

Top Results From Across the Web

Evaluating BART on CNN/DM : How to process dataset #1391

Follow instructions here to download and process into data-files such that test.source and test.target has one line for each non-tokenized ...

BARTSCORE: Evaluating Generated Text as Text Generation

Experimentally, we evaluate different variants of BARTSCORE from 7 perspectives on 16 datasets. BARTSCORE achieves the best performance in 16 of 22 test ......

BARTScore: Evaluating Generated Text as Text Generation

(2) BARTScore can better support evaluation of generated text from different perspectives (e.g., ... We use BART fine-tuned on CNNDM dataset Hermann et...

Text Summarization | Papers With Code

263 papers with code • 27 benchmarks • 66 datasets ... Trend, Dataset, Best Model, Paper, Code ... BARTScore: Evaluating Generated Text as...

BARTSCORE: Evaluating Generated Text as Text Generation

35th Conference on Neural Information Processing Systems (NeurIPS 2021), Sydney, ... We use BART fine-tuned on CNNDM dataset [20], which is.