Add support for truncation argument when calling a Pipeline
See original GitHub issue🚀 Feature request
Currently, only the padding
argument is supported when calling a pipeline, and it’s not possible to pass truncation
argument. For example, running the following code sample would raise an error:
import transformers as trf
model = trf.pipeline(task='feature-extraction', model='bert-base-cased')
output = model('a sample text', padding=False, truncation=True)
Motivation
If toggling padding is supported, then why truncation shouldn’t be?
Your contribution
I think to achieve this, same as padding
, only a truncation
argument should be added to _parse_and_tokenize
method and also when calling the tokenizer. If that’s the case, I would be willing to work on a PR.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:2
- Comments:5 (2 by maintainers)
Top Results From Across the Web
Truncating sequence -- within a pipeline - Hugging Face Forums
So I have two questions: Is there a way to just add an argument somewhere that does the truncation automatically?
Read more >How to truncate input in the Huggingface pipeline?
Is there any way of passing the max_length and truncate parameters from the tokenizer directly to the pipeline? My work around is to...
Read more >Hyperparameter tuning a model (v2) - Azure Machine Learning
Automate efficient hyperparameter tuning using Azure Machine Learning SDK v2 and CLI v2 by way of the SweepJob type. Define the parameter ......
Read more >Copy number calling pipeline — CNVkit 0.9.8 documentation
A listing of all sub-commands can be obtained with cnvkit --help or -h , and the usage ... The pipeline executed by the...
Read more >libpipeline(3) - Linux manual page - man7.org
The calling program may then start the pipeline, read output from it, wait for it to ... Convenience function to add an argument...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Hi, even though this has been closed as stale, without comment or supposed fix, it seems that in recent versions you can in fact pass both
truncation
andpadding
arguments to the pipeline’s__call__
method, and it will correctly use them when tokenizing. I’ve tested it with long texts that fail without the truncation argument, and it seems to work as expected.+1 on this