Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add support for truncation argument when calling a Pipeline

See original GitHub issue

🚀 Feature request

Currently, only the padding argument is supported when calling a pipeline, and it’s not possible to pass truncation argument. For example, running the following code sample would raise an error:

import transformers as trf

model = trf.pipeline(task='feature-extraction', model='bert-base-cased')
output = model('a sample text', padding=False, truncation=True)

Motivation

If toggling padding is supported, then why truncation shouldn’t be?

Your contribution

I think to achieve this, same as padding, only a truncation argument should be added to _parse_and_tokenize method and also when calling the tokenizer. If that’s the case, I would be willing to work on a PR.

Issue Analytics

State:
Created 3 years ago
Reactions:2
Comments:5 (2 by maintainers)

Top GitHub Comments

3reactions

buhrmanncommented, Jul 30, 2021

Hi, even though this has been closed as stale, without comment or supposed fix, it seems that in recent versions you can in fact pass both truncation and padding arguments to the pipeline’s __call__ method, and it will correctly use them when tokenizing. I’ve tested it with long texts that fail without the truncation argument, and it seems to work as expected.

0reactions

alexbwcommented, Feb 7, 2021

+1 on this