Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

FillMaskPipeline very slow when provided with a large `targets`

See original GitHub issue

Environment info

transformers version: 4.6.1
Platform: Linux-5.4.0-67-generic-x86_64-with-glibc2.10
Python version: 3.8.5
PyTorch version (GPU?): 1.8.1 (False)
Tensorflow version (GPU?): N/A
Using GPU in script?: No
Using distributed or parallel set-up in script?: No

Who can help

@LysandreJik @Narsil

Information

The model I am using: ethanyt/guwenbert-base, with a RoBERTa model and a BertTokenizerFast tokenizer.

To reproduce

Steps to reproduce the behavior:

Initialize a fill-mask pipeline with the model and the tokenizer mentioned above
Call it with any sentence and a large targets (with a length of ~10k single words)

Problem

The call would be much slower than a similar call without a targets argument. A call without a targets argument costs ~0.1s, while a call with a targets argument costs ~0.3s.

The following code is present in src/transformers/pipelines/fill_mask.py:

class FillMaskPipeline(Pipeline):
    # ...
    def __call__(self, *args, targets=None, top_k: Optional[int] = None, **kwargs):
        # ...
        if targets is not None:
            # ...
            targets_proc = []
            for target in targets:
                target_enc = self.tokenizer.tokenize(target)
                # ...
                targets_proc.append(target_enc[0])

This function iterates through targets, rather than sending it directly to tokenize, which does not utilize the batch processing optimization of TokenizerFasts, hence the slow speed.

Issue Analytics

State:
Created 2 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

Narsilcommented, Jun 11, 2021

I was able to reproduce and optimize away most of the performance, now any example should run at roughly the same speed.

Slowdown will happen when you miss the vocabulary, but the warnings should help users figure it out.

0reactions

EtaoinWucommented, Jun 11, 2021

Thanks a lot. As a background, I found the issue when reproducing the following paper:

Deng, Liming, et al. “An Iterative Polishing Framework Based on Quality Aware Masked Language Model for Chinese Poetry Generation.” Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 34. No. 05. 2020.

which involves calling FillMaskPipeline iteratively 10 times at most for each API call, which depending on the input, may or may not have the targets parameter. The time difference in the two types of API calls made me find this issue.