question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Feature Request: Question Answering dataset

See original GitHub issue

Describe the problem

Given the relative scarcity of tools for building a question-answering dataset, it would be great if doccano could serve that purpose. At its core, I believe that would mean each document is a passage, then there would be forms below (like in a Translation project) that allow you to enter in questions. Then you’d need the ability to annotate stretches of text as the answer (similar to a Sequence Labeling project). The main functionality that would be tricky would likely be tying the labels for the answer annotation back to the input questions.

A naive first pass could be to replicate a combination of the SequenceLabeling+MachineTranslation UIs together. The Labels for each answer could be simply answer-to-Q1, answer-to-Q2 and each document could simply have a max number of questions associated.

Does this sound doable?

Issue Analytics

  • State:open
  • Created 5 years ago
  • Reactions:29
  • Comments:13 (7 by maintainers)

github_iconTop GitHub Comments

9reactions
SunYanCNcommented, Jun 9, 2019

This feature is really important and urgent, I look forward to joining this feature, I will continue to pay attention to this feature.

6reactions
jamesmfcommented, Jan 29, 2019

Note: the need for a tool like this is greater now that great datasets like SQuAD exist, because it’s more practical to train your own models. Given the fact that you can fine-tune BERT on SQuAD, it’d be invaluable to have a resource that lets you build your own SQuAD-formatted dataset that is domain specific!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Question answering - Hugging Face Course
The dataset that is used the most as an academic benchmark for extractive question answering is SQuAD, so that's the one we'll use...
Read more >
Question answering - NLP-progress
HotpotQA is a dataset with 113k Wikipedia-based question-answer pairs. Questions require finding and reasoning over multiple supporting documents and are not ...
Read more >
Machine Learning Datasets - Papers With Code
The WebQuestions dataset is a question answering dataset using Freebase as the knowledge base and contains 6,642 question-answer pairs. It was created by ......
Read more >
Amazon releases dataset for complex, multilingual question ...
Dataset that requires question-answering models to look up multiple facts and perform comparisons bridges a significant gap in the field.
Read more >
Feature Request : Download dataset sample - Kaggle
Feature Request : Download dataset sample :) ... Questions & Answers ... I think most people would like to first try a hands...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found