Minimal answer spans are wrong for some examples.
See original GitHub issueWhen I try to read the data and print questions and minimal span answers, I noticed that for some examples plaintext_start_byte
and plaintext_end_byte
of minimal answer span are shifted by k
symbols right.
Example: Question: ‘Who created the series Clannad?’ Minimal span: ‘rel’ (it should be ‘Key’, but its values are shifted and it gives ‘rel’ part of the word ‘released’)
... Clannad(クラナド,Kuranado) is a Japanese visual novel developed by Key and released on April 28, 2004 ...
This is how I read the data file:
with open(path_name) as input_file:
for line in input_file:
try:
json_example = json.loads(line)
if json_example['language'] not in allowed_langs:
continue
plain_text = json_example['document_plaintext']
...
Issue Analytics
- State:
- Created 3 years ago
- Comments:7
Top Results From Across the Web
Question Answering with Long Multiple-Span ... - ACL Anthology
Answering questions in many real-world appli- cations often requires complex and precise in- formation excerpted from texts spanned across a long document.
Read more >Question Answering with Long Multiple-Span ... - Virginia Tech
1 shows an example question, and its corresponding context and answer from our dataset, which poses several unique challenges. First, the contexts are ......
Read more >Rejecting bad data spans and breaks - MNE-Python
Rejecting bad data spans and breaks#. This tutorial covers: manual marking of bad spans of data,. automated rejection of data spans based on ......
Read more >F1 score in NLP span-based Question Answering task
where tp stands for true positive, fp for false positive and fn for false negative. The definition of a F1 score is not...
Read more >Context-Aware Answer Extraction in Question Answering
answer-spans in the relevant contexts from given passages, they sometimes result in predicting the. Figure 1: Example passage, question, and answer triple.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I suspect that you’re doing a slice on the Python string’s characters instead of bytes. In Python3, string indices are given in unicode characters (unlike Python2, which used bytes). The example you gave has non-ascii characters that take up more than one byte, so the byte and character indices will not be the same:
I think @tomohideshibata code works fine