FillMaskPipeline very slow when provided with a large `targets`
See original GitHub issueEnvironment info
transformers
version: 4.6.1- Platform: Linux-5.4.0-67-generic-x86_64-with-glibc2.10
- Python version: 3.8.5
- PyTorch version (GPU?): 1.8.1 (False)
- Tensorflow version (GPU?): N/A
- Using GPU in script?: No
- Using distributed or parallel set-up in script?: No
Who can help
Information
The model I am using: ethanyt/guwenbert-base
, with a RoBERTa
model and a BertTokenizerFast
tokenizer.
To reproduce
Steps to reproduce the behavior:
- Initialize a
fill-mask
pipeline with the model and the tokenizer mentioned above - Call it with any sentence and a large
targets
(with a length of ~10k single words)
Problem
The call would be much slower than a similar call without a targets
argument. A call without a targets
argument costs ~0.1s, while a call with a targets
argument costs ~0.3s.
The following code is present in src/transformers/pipelines/fill_mask.py
:
class FillMaskPipeline(Pipeline):
# ...
def __call__(self, *args, targets=None, top_k: Optional[int] = None, **kwargs):
# ...
if targets is not None:
# ...
targets_proc = []
for target in targets:
target_enc = self.tokenizer.tokenize(target)
# ...
targets_proc.append(target_enc[0])
This function iterates through targets, rather than sending it directly to tokenize
, which does not utilize the batch processing optimization of TokenizerFast
s, hence the slow speed.
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
Pipelines - Hugging Face
If the provided targets are not in the model vocab, they will be tokenized and the first resulting token will be used (with...
Read more >Notes on Transformers Book Ch. 9 - Christian Mills
We can fine-tune a language model on a large corpus of unlabeled data before ... The goal is to train a model that...
Read more >Using huggingface fill-mask pipeline to get the "score" for a ...
I've been using huggingface to make predictions for masked tokens and it works great. I noticed that for each ...
Read more >A pipeline for large raw text preprocessing and model training ...
Once the cleaning process has ended and the data are cleaned and formatted, the corpora is organized, decontaminated from sentences in the target...
Read more >How to Maximize Retriever Performance on a More Natural ...
many questions lack context in absence of the provided paragraph and b.) there is a high lexical overlap between passages and questions ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I was able to reproduce and optimize away most of the performance, now any example should run at roughly the same speed.
Slowdown will happen when you miss the vocabulary, but the warnings should help users figure it out.
Thanks a lot. As a background, I found the issue when reproducing the following paper:
which involves calling
FillMaskPipeline
iteratively 10 times at most for each API call, which depending on the input, may or may not have thetargets
parameter. The time difference in the two types of API calls made me find this issue.