Padding Strategy Code missing an else case (maybe?)
See original GitHub issueEnvironment info
transformers
version: 3.0.2- Platform: macOS 10.15.5
- Python version: 3.7
- PyTorch version (GPU?): 1.5 GPU-Yes
- Using GPU in script?: No
- Using distributed or parallel set-up in script?: No
Who can help
tokenizers: @mfuntowicz Summarization: @sshleifer T5: @patrickvonplaten
Information
Model I am using (T5 via Autotokenizer):
The problem arises when using:
tokenizer([line], max_length=max_length, padding='max_length' if pad_to_max_length else False, truncation=True, return_tensors=return_tensors, **extra_kw)
In batch encoding, the latest code decides on a padding strategy:
_get_padding_truncation_strategies( self, padding=False, truncation=False, max_length=None, pad_to_multiple_of=None, verbose=True, **kwargs ):
` elif padding is not False:
if padding is True:
padding_strategy = PaddingStrategy.LONGEST # Default to pad to the longest sequence in the batch
elif not isinstance(padding, PaddingStrategy):
padding_strategy = PaddingStrategy(padding)`
While calling the tokenizer, instead of ‘max_length’ I first gave the actual PaddingStrategy.MAX_LENGTH Enum as argument, but the above code throws an error as ‘padding_strategy’ is not defined.
To reproduce
Call the tokenizer as:
tokenizer([line], max_length=max_length, padding=PaddingStrategy.MAX_LENGTH if pad_to_max_length else False, truncation=True, return_tensors=return_tensors, **extra_kw)
Expected behavior
The PaddingStrategy enum should be assigned no issue.
##Suggested Solution
` elif padding is not False:
if padding is True:
padding_strategy = PaddingStrategy.LONGEST # Default to pad to the longest sequence in the batch
elif not isinstance(padding, PaddingStrategy):
padding_strategy = PaddingStrategy(padding)
else:
padding_strategy = padding`
It’s a one line fix basically, I can raise a PR for the same, unless PaddingStrategy wasn’t designed to be used directly?
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (4 by maintainers)
This issue also applies to the
truncation
parameter.I assumed the enums are supposed to be used directly because the release notes (https://github.com/huggingface/transformers/releases/tag/v3.0.0) explicitly mention the
TensorType
enum, which is defined right below thePaddingStrategy
andTruncationStrategy
enums.I agree that this is a problem that should be fixed, if the enums are meant to be used.
Nice, thanks!