question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

QuestionAnsweringPipeline query performance

See original GitHub issue

This is my first issue posted here, so first off thank you for building this library, it’s really pushing NLP forward.

The current QuestionAnsweringPipeline relies on the method squad_convert_examples_to_features to convert question/context pairs to SquadFeatures. In reviewing this method, it looks like it spawns a process for each example.

This is causing performance issues when looking to support near real-time queries or bulk queries. As a workaround, I can directly issue the queries against the model but the pipeline has a lot of nice logic to help format answers properly and pulling the best answer vs start/end argmax.

Please see the results of a rudimentary performance test to demonstrate:

import time

from transformers import pipeline

context = r"""
The extractive question answering process took an average of 36.555 seconds using pipelines and about 2 seconds when
queried directly using the models.
"""
question = "How long did the process take?"

nlp = pipeline("question-answering", model="distilbert-base-cased-distilled-squad", tokenizer="distilbert-base-cased-distilled-squad")

start = time.time()
for x in range(100):
    answer = nlp(question=question, context=context)

print("Answer", answer)
print("Time", time.time() - start, "s")
Answer {'score': 0.8029816785368773, 'start': 62, 'end': 76, 'answer': '36.555 seconds'}
Time 36.703474044799805 s
import torch

from transformers import pipeline, AutoModelForQuestionAnswering, AutoTokenizer

model = AutoModelForQuestionAnswering.from_pretrained("distilbert-base-cased-distilled-squad")
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-cased-distilled-squad")

start = time.time()
for x in range(100):
    inputs = tokenizer.encode_plus(question, context, add_special_tokens=True, return_tensors="pt")
    input_ids = inputs["input_ids"].tolist()[0]

    text_tokens = tokenizer.convert_ids_to_tokens(input_ids)
    answer_start_scores, answer_end_scores = model(**inputs)

    answer_start = torch.argmax(
        answer_start_scores
    )  # Get the most likely beginning of answer with the argmax of the score
    answer_end = torch.argmax(answer_end_scores) + 1  # Get the most likely end of answer with the argmax of the score

    answer = tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end]))

print("Answer", answer)
print("Time", time.time() - start, "s")
Answer 36 . 555 seconds
Time 2.1718859672546387 s

I believe the 10x slowdown is that the first example had to spawn 100 processes. I also tried passing a list of 100 question/context pairs to see if that was better and that took ~28s. But for this use case, all 100 questions wouldn’t be available at once.

The additional logic for answer extraction doesn’t come for free but it doesn’t add much overhead. The third test below uses a custom pipeline component to demonstrate.

from cord19q.pipeline import Pipeline

pipeline = Pipeline("distilbert-base-cased-distilled-squad", False)

start = time.time()
for x in range(100):
    answer = pipeline([question], [context])

print("\nAnswer", answer)
print("Time", time.time() - start, "s")
Answer [{'answer': '36.555 seconds', 'score': 0.8029860216482803}]
Time 2.219379186630249 s

It would be great if the QuestionAnsweringPipeline could either not use the squad processor or the processor is changed to have an argument to not spawn processes.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:11 (8 by maintainers)

github_iconTop GitHub Comments

4reactions
LysandreJikcommented, Aug 3, 2020

Hi @davidmezzetti, just to let you know we’re working towards a bigger pipeline refactor, with a strong focus on performance. Let’s keep this issue open while it’s still in the works in case more is to be said on the matter.

1reaction
LysandreJikcommented, Dec 7, 2020

Glad to hear it!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pipelines - Hugging Face
Measure performance on your load, with your hardware. Measure, measure, and keep measuring. Real numbers are the only way to go. If you...
Read more >
Parameter-Tweaking: Get Faster Answers from Your Haystack ...
This article is the first in our series on optimizing Haystack. Learn how to configure a parameter-rich Haystack question answering ...
Read more >
Answering Questions with HuggingFace Pipelines and Streamlit
See how easy it can be to build a simple web app for question answering from text using Streamlit and HuggingFace pipelines.
Read more >
Question Answering on Tabular Data with HuggingFace ...
It achieves state-of-the-art on both SQA and WTQ, while having comparable performance to SOTA on WikiSQL, with a much simpler architecture.
Read more >
Table Question Answering - Overview
The retriever transforms natural language queries and tabular data into ... and tokenizer from the Huggingface model hub into a question-answering pipeline.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found