Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

KeyError in Pipeline Question Answering with LongFormer

See original GitHub issue

I’m trying to do QA with LongFormer in a Pipeline. First of all, I generate the pipeline: MODEL_STR = "mrm8488/longformer-base-4096-finetuned-squadv2" tokenizer = AutoTokenizer.from_pretrained(MODEL_STR) model = AutoModelForQuestionAnswering.from_pretrained(MODEL_STR) QA = pipeline('question-answering', model=model, tokenizer=tokenizer)

Then, I get the paper text from which I want the answer to come from, named my_article, that’s a string containing the full body of the article (around 3000 words). Then, I try:

with torch.no_grad(): answer = QA(question=question, context=articles_abstract.body_text.iloc[0])

And it throws the following error:

` eyError Traceback (most recent call last) <ipython-input-53-b5f8dc0503c8> in <module> 1 with torch.no_grad(): ----> 2 answer = QA(question=question, context=articles_abstract.body_text.iloc[0])

~/miniconda/envs/transformers_env/lib/python3.7/site-packages/transformers/pipelines.py in call(self, *args, **kwargs) 1225 ), 1226 } -> 1227 for s, e, score in zip(starts, ends, scores) 1228 ] 1229

~/miniconda/envs/transformers_env/lib/python3.7/site-packages/transformers/pipelines.py in <listcomp>(.0) 1225 ), 1226 } -> 1227 for s, e, score in zip(starts, ends, scores) 1228 ] 1229

KeyError: 382 `

How can I solve this issue? More importantly, what do you think is causing the issue?

Thanks in advance! 😃

Issue Analytics

State:
Created 3 years ago
Comments:9 (5 by maintainers)

Top GitHub Comments

3reactions

patil-surajcommented, Jun 5, 2020

@alexvaca0

Please check which architecture you are using, and then go to the docs and find the doc for QA model, it contains the example on how to use it without pipeline. So if your architecture is BERT then there will be a model BertForQuestionAnswering. You’ll find the example in the model’s doc. Basically what you’ll need to do is this

# import your model class, you can also use AutoModelForQuestionAnswering and AutoTokenizer
from transformers import BertTokenizer, BertForQuestionAnswering
import torch

# load the model and tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForQuestionAnswering.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')

# encode the question and text
question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"
encoding = tokenizer.encode_plus(question, text)
input_ids, token_type_ids = encoding["input_ids"], encoding["token_type_ids"]

# do the forward pass, each qa model returns start_scores, end_scores
start_scores, end_scores = model(torch.tensor([input_ids]), token_type_ids=torch.tensor([token_type_ids]))

# extract the span
all_tokens = tokenizer.convert_ids_to_tokens(input_ids)
answer = ' '.join(all_tokens[torch.argmax(start_scores) : torch.argmax(end_scores)+1])

assert answer == "a nice puppet"

Hope this helps you.

0reactions

stale[bot]commented, Aug 4, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Top Results From Across the Web

Source code for transformers.pipelines.question_answering

Source code for transformers.pipelines.question_answering ... This question answering pipeline can currently be loaded from :func:`~transformers.pipeline` ...

KeyError when using non-default models in Huggingface ...

My problem: from transformers import XLMRobertaTokenizer, XLMRobertaForQuestionAnswering from transformers import pipeline nlp ...

How-to Fine-Tune a Q&A Transformer - Towards Data Science

Learn how to fine-tune ML models for question-and-answering ... Transformer models are undoubtedly the leaders in NLP — outperforming almost every ...

Master's thesis Source Code Generation from Descriptions in ...

complicated tasks such as Question Answering, Summarization, Dialogue ... Tokenization is an integral part of a data input pipeline for most NLP systems....

Transformers - Spark NLP

LongformerForQuestionAnswering can load Longformer Models with a span classification head on top for extractive question-answering tasks like ...