Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

OpenBookQA has missing and inconsistent field names

See original GitHub issue

Describe the bug

OpenBookQA implementation is inconsistent with the original dataset.

We need to:

The dataset field [question][stem] is flattened into question_stem. Unflatten it to match the original format.
Add missing additional fields:
- ‘fact1’: row[‘fact1’],
- ‘humanScore’: row[‘humanScore’],
- ‘clarity’: row[‘clarity’],
- ‘turkIdAnonymized’: row[‘turkIdAnonymized’]
Ensure the structure and every data item in the original OpenBookQA matches our OpenBookQA version.

Expected results

The structure and every data item in the original OpenBookQA matches our OpenBookQA version.

Actual results

TBD

Environment info

datasets version: 2.1.0
Platform: macOS-10.15.7-x86_64-i386-64bit
Python version: 3.8.13
PyArrow version: 7.0.0
Pandas version: 1.4.2

Issue Analytics

State:
Created a year ago
Comments:11 (10 by maintainers)

Top GitHub Comments

2reactions

mariosaskocommented, May 4, 2022

IMO we should always try to preserve the original structure unless there is a good reason not to (and I don’t see one in this case).

1reaction

lhoestqcommented, Oct 5, 2022

Indeed @osbm thanks. I’m closing this issue if it’s fine for you all then

Top Results From Across the Web

OpenbookQA and CommonsenseQA data format issues

In fact, OpenbookQA contains the wrong data for the label field [1].… ... CommonSenseQA has missing and inconsistent field names.

OpenBookQA Dataset - Papers With Code

OpenBookQA is a new kind of question-answering dataset modeled after open book exams for assessing human understanding of a subject.

ICLR 2022 Conference - OpenReview

... so that the neural network optimization landscape has no spurious valleys. ... a large margin on ANLI, RTE, BoolQ, AI2-ARC, OpenbookQA, and...

Messing with GPT-3 - matthewmcateer.me

GPT-3 (from “Generative Pretrained Transformer 3”) is a language model, ... There's no real new insights on the architecture, or the training strategies....

ZeroQA and Relevant Subset Selection for AI2 Reasoning ...

there are many other scientific questions datasets, like OpenBookQA [24] and ... fields, such as machine learning and computer vision, show that this...