Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Batch inference runtime slows down for inputs with different length sentences

See original GitHub issue

Environment info

transformers version: 4.6.1
Platform: Ubuntu 18.04.5 LTS
Python version: 3.6.9
PyTorch version (GPU?): 1.8.1
Tensorflow version (GPU?):
Using GPU in script?: yes
Using distributed or parallel set-up in script?: no

Who can help

Information

Model I am using (Bert, XLNet …): LukeForEntityPairClassification

The problem arises when using:

the official example scripts: (give details below)
my own modified scripts: (give details below)

The tasks I am working on is:

an official GLUE/SQUaD task: (give the name)
my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

generate batched inputs for the LukeTokenizer with identical sentences in each batch (i.e. no padding required)
tokenize each batch by passing the batch to the tokenizer
run inference on each batch on GPU and notice that runtime is the same for each batch
generate batched inputs for the LukeTokenizer with sentences of different length in each batch (i.e. padding is required)
tokenize each batch by passing the batch to the tokenizer with padding=True
run inference on each batch on GPU and notice that runtime increases substantially for subsequent batches after first batch

import torch
from transformers import LukeForEntityPairClassification, LukeTokenizer
import time
text1 = "Beyoncé lives in Los Angeles."
entity_spans1 = [(0, 7), (17, 28)]
text2 = "Kevin Love has urged the Cleveland Cavaliers to fight to regain their form following LeBron James' move to the Los Angeles Lakers."
entity_spans2 = [(85, 97), (111, 129)]

# experiment 1 - sentence length is identical across the full batch
text = [[text1] * 10, [text2] * 10]
entity_spans = [[entity_spans1] * 10, [entity_spans2] * 10]
model = LukeForEntityPairClassification.from_pretrained("studio-ousia/luke-large-finetuned-tacred")
tokenizer = LukeTokenizer.from_pretrained("studio-ousia/luke-large-finetuned-tacred")
tokenized_inputs = []
for text_batch, span_batch in zip(text, entity_spans):
    inputs = tokenizer(text_batch, entity_spans=span_batch, return_tensors="pt", padding=True, truncation=True)
    tokenized_inputs.append(inputs)
device = torch.device('cuda')
model.to(device)
model.eval()
for i, batch in enumerate(tokenized_inputs):
    with torch.no_grad():
        start = time.time()
        batch.to(device)
        outputs = model(**batch)
        print(f"runtime batch {i}: ", time.time() - start)


# experiment 2 - sentence length alternates in length across the batch
text = [[text1, text2] * 10] * 2
entity_spans = [[entity_spans1, entity_spans2] * 10] * 2
model = LukeForEntityPairClassification.from_pretrained("studio-ousia/luke-large-finetuned-tacred")
tokenizer = LukeTokenizer.from_pretrained("studio-ousia/luke-large-finetuned-tacred")
tokenized_inputs = []
for text_batch, span_batch in zip(text, entity_spans):
    inputs = tokenizer(text_batch, entity_spans=span_batch, return_tensors="pt", padding=True, truncation=True)
    tokenized_inputs.append(inputs)
device = torch.device('cuda')
model.to(device)
model.eval()
for i, batch in enumerate(tokenized_inputs):
    with torch.no_grad():
        start = time.time()
        batch.to(device)
        outputs = model(**batch)
        print(f"runtime batch {i}: ", time.time() - start)

# results - Tesla T4
runtime batch 0:  0.028860092163085938
runtime batch 1:  0.03273129463195801
runtime batch 0:  0.028328895568847656
runtime batch 1:  0.09934639930725098

Expected behavior

I expect the runtime to be the same for an identical batch of inputs

Issue Analytics

State:
Created 2 years ago
Reactions:2
Comments:10 (4 by maintainers)

Top GitHub Comments

1reaction

LysandreJikcommented, Jun 21, 2021

Pinging @NielsRogge as he might have an idea of what’s going on with LUKE

0reactions

github-actions[bot]commented, Sep 13, 2021

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.