question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

OpenBookQA has missing and inconsistent field names

See original GitHub issue

Describe the bug

OpenBookQA implementation is inconsistent with the original dataset.

We need to:

  1. The dataset field [question][stem] is flattened into question_stem. Unflatten it to match the original format.
  2. Add missing additional fields:
    • ‘fact1’: row[‘fact1’],
    • ‘humanScore’: row[‘humanScore’],
    • ‘clarity’: row[‘clarity’],
    • ‘turkIdAnonymized’: row[‘turkIdAnonymized’]
  3. Ensure the structure and every data item in the original OpenBookQA matches our OpenBookQA version.

Expected results

The structure and every data item in the original OpenBookQA matches our OpenBookQA version.

Actual results

TBD

Environment info

  • datasets version: 2.1.0
  • Platform: macOS-10.15.7-x86_64-i386-64bit
  • Python version: 3.8.13
  • PyArrow version: 7.0.0
  • Pandas version: 1.4.2

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:11 (10 by maintainers)

github_iconTop GitHub Comments

2reactions
mariosaskocommented, May 4, 2022

IMO we should always try to preserve the original structure unless there is a good reason not to (and I don’t see one in this case).

1reaction
lhoestqcommented, Oct 5, 2022

Indeed @osbm thanks. I’m closing this issue if it’s fine for you all then

Read more comments on GitHub >

github_iconTop Results From Across the Web

OpenbookQA and CommonsenseQA data format issues
In fact, OpenbookQA contains the wrong data for the label field [1].… ... CommonSenseQA has missing and inconsistent field names.
Read more >
OpenBookQA Dataset - Papers With Code
OpenBookQA is a new kind of question-answering dataset modeled after open book exams for assessing human understanding of a subject.
Read more >
ICLR 2022 Conference - OpenReview
... so that the neural network optimization landscape has no spurious valleys. ... a large margin on ANLI, RTE, BoolQ, AI2-ARC, OpenbookQA, and...
Read more >
Messing with GPT-3 - matthewmcateer.me
GPT-3 (from “Generative Pretrained Transformer 3”) is a language model, ... There's no real new insights on the architecture, or the training strategies....
Read more >
ZeroQA and Relevant Subset Selection for AI2 Reasoning ...
there are many other scientific questions datasets, like OpenBookQA [24] and ... fields, such as machine learning and computer vision, show that this...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found