OpenBookQA has missing and inconsistent field names
See original GitHub issueDescribe the bug
OpenBookQA implementation is inconsistent with the original dataset.
We need to:
- The dataset field [question][stem] is flattened into question_stem. Unflatten it to match the original format.
- Add missing additional fields:
- ‘fact1’: row[‘fact1’],
- ‘humanScore’: row[‘humanScore’],
- ‘clarity’: row[‘clarity’],
- ‘turkIdAnonymized’: row[‘turkIdAnonymized’]
- Ensure the structure and every data item in the original OpenBookQA matches our OpenBookQA version.
Expected results
The structure and every data item in the original OpenBookQA matches our OpenBookQA version.
Actual results
TBD
Environment info
datasets
version: 2.1.0- Platform: macOS-10.15.7-x86_64-i386-64bit
- Python version: 3.8.13
- PyArrow version: 7.0.0
- Pandas version: 1.4.2
Issue Analytics
- State:
- Created a year ago
- Comments:11 (10 by maintainers)
Top Results From Across the Web
OpenbookQA and CommonsenseQA data format issues
In fact, OpenbookQA contains the wrong data for the label field [1].… ... CommonSenseQA has missing and inconsistent field names.
Read more >OpenBookQA Dataset - Papers With Code
OpenBookQA is a new kind of question-answering dataset modeled after open book exams for assessing human understanding of a subject.
Read more >ICLR 2022 Conference - OpenReview
... so that the neural network optimization landscape has no spurious valleys. ... a large margin on ANLI, RTE, BoolQ, AI2-ARC, OpenbookQA, and...
Read more >Messing with GPT-3 - matthewmcateer.me
GPT-3 (from “Generative Pretrained Transformer 3”) is a language model, ... There's no real new insights on the architecture, or the training strategies....
Read more >ZeroQA and Relevant Subset Selection for AI2 Reasoning ...
there are many other scientific questions datasets, like OpenBookQA [24] and ... fields, such as machine learning and computer vision, show that this...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
IMO we should always try to preserve the original structure unless there is a good reason not to (and I don’t see one in this case).
Indeed @osbm thanks. I’m closing this issue if it’s fine for you all then