question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

KeyError in Pipeline Question Answering with LongFormer

See original GitHub issue

I’m trying to do QA with LongFormer in a Pipeline. First of all, I generate the pipeline: MODEL_STR = "mrm8488/longformer-base-4096-finetuned-squadv2" tokenizer = AutoTokenizer.from_pretrained(MODEL_STR) model = AutoModelForQuestionAnswering.from_pretrained(MODEL_STR) QA = pipeline('question-answering', model=model, tokenizer=tokenizer)

Then, I get the paper text from which I want the answer to come from, named my_article, that’s a string containing the full body of the article (around 3000 words). Then, I try:

with torch.no_grad(): answer = QA(question=question, context=articles_abstract.body_text.iloc[0])

And it throws the following error:

` eyError Traceback (most recent call last) <ipython-input-53-b5f8dc0503c8> in <module> 1 with torch.no_grad(): ----> 2 answer = QA(question=question, context=articles_abstract.body_text.iloc[0])

~/miniconda/envs/transformers_env/lib/python3.7/site-packages/transformers/pipelines.py in call(self, *args, **kwargs) 1225 ), 1226 } -> 1227 for s, e, score in zip(starts, ends, scores) 1228 ] 1229

~/miniconda/envs/transformers_env/lib/python3.7/site-packages/transformers/pipelines.py in <listcomp>(.0) 1225 ), 1226 } -> 1227 for s, e, score in zip(starts, ends, scores) 1228 ] 1229

KeyError: 382 `

How can I solve this issue? More importantly, what do you think is causing the issue?

Thanks in advance! 😃

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:9 (5 by maintainers)

github_iconTop GitHub Comments

3reactions
patil-surajcommented, Jun 5, 2020

@alexvaca0

Please check which architecture you are using, and then go to the docs and find the doc for QA model, it contains the example on how to use it without pipeline. So if your architecture is BERT then there will be a model BertForQuestionAnswering. You’ll find the example in the model’s doc. Basically what you’ll need to do is this

# import your model class, you can also use AutoModelForQuestionAnswering and AutoTokenizer
from transformers import BertTokenizer, BertForQuestionAnswering
import torch

# load the model and tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForQuestionAnswering.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')

# encode the question and text
question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"
encoding = tokenizer.encode_plus(question, text)
input_ids, token_type_ids = encoding["input_ids"], encoding["token_type_ids"]

# do the forward pass, each qa model returns start_scores, end_scores
start_scores, end_scores = model(torch.tensor([input_ids]), token_type_ids=torch.tensor([token_type_ids]))

# extract the span
all_tokens = tokenizer.convert_ids_to_tokens(input_ids)
answer = ' '.join(all_tokens[torch.argmax(start_scores) : torch.argmax(end_scores)+1])

assert answer == "a nice puppet"

Hope this helps you.

0reactions
stale[bot]commented, Aug 4, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Source code for transformers.pipelines.question_answering
Source code for transformers.pipelines.question_answering ... This question answering pipeline can currently be loaded from :func:`~transformers.pipeline` ...
Read more >
KeyError when using non-default models in Huggingface ...
My problem: from transformers import XLMRobertaTokenizer, XLMRobertaForQuestionAnswering from transformers import pipeline nlp ...
Read more >
How-to Fine-Tune a Q&A Transformer - Towards Data Science
Learn how to fine-tune ML models for question-and-answering ... Transformer models are undoubtedly the leaders in NLP — outperforming almost every ...
Read more >
Master's thesis Source Code Generation from Descriptions in ...
complicated tasks such as Question Answering, Summarization, Dialogue ... Tokenization is an integral part of a data input pipeline for most NLP systems....
Read more >
Transformers - Spark NLP
LongformerForQuestionAnswering can load Longformer Models with a span classification head on top for extractive question-answering tasks like ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found