Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Padding Strategy Code missing an else case (maybe?)

See original GitHub issue

Environment info

transformers version: 3.0.2
Platform: macOS 10.15.5
Python version: 3.7
PyTorch version (GPU?): 1.5 GPU-Yes
Using GPU in script?: No
Using distributed or parallel set-up in script?: No

Who can help

tokenizers: @mfuntowicz Summarization: @sshleifer T5: @patrickvonplaten

Information

Model I am using (T5 via Autotokenizer):

The problem arises when using: tokenizer([line], max_length=max_length, padding='max_length' if pad_to_max_length else False, truncation=True, return_tensors=return_tensors, **extra_kw)

In batch encoding, the latest code decides on a padding strategy: _get_padding_truncation_strategies( self, padding=False, truncation=False, max_length=None, pad_to_multiple_of=None, verbose=True, **kwargs ):

   ` elif padding is not False:
        if padding is True:
            padding_strategy = PaddingStrategy.LONGEST  # Default to pad to the longest sequence in the batch
        elif not isinstance(padding, PaddingStrategy):
            padding_strategy = PaddingStrategy(padding)`

While calling the tokenizer, instead of ‘max_length’ I first gave the actual PaddingStrategy.MAX_LENGTH Enum as argument, but the above code throws an error as ‘padding_strategy’ is not defined.

To reproduce

Call the tokenizer as: tokenizer([line], max_length=max_length, padding=PaddingStrategy.MAX_LENGTH if pad_to_max_length else False, truncation=True, return_tensors=return_tensors, **extra_kw)

Expected behavior

The PaddingStrategy enum should be assigned no issue.

##Suggested Solution

                ` elif padding is not False:
                          if padding is True:
                             padding_strategy = PaddingStrategy.LONGEST  # Default to pad to the longest sequence in the batch
                  elif not isinstance(padding, PaddingStrategy):
                         padding_strategy = PaddingStrategy(padding)
                  else:
                      padding_strategy = padding`

It’s a one line fix basically, I can raise a PR for the same, unless PaddingStrategy wasn’t designed to be used directly?

Issue Analytics

State:
Created 3 years ago
Comments:5 (4 by maintainers)

Top GitHub Comments

1reaction

aphedgescommented, Sep 4, 2020

This issue also applies to the truncation parameter.

I assumed the enums are supposed to be used directly because the release notes (https://github.com/huggingface/transformers/releases/tag/v3.0.0) explicitly mention the TensorType enum, which is defined right below the PaddingStrategy and TruncationStrategy enums.

I agree that this is a problem that should be fixed, if the enums are meant to be used.

0reactions

sshleifercommented, Nov 5, 2020

Nice, thanks!

Top Results From Across the Web

Python: Ignore 'Incorrect padding' error when base64 decoding

Show activity on this post. "Incorrect padding" can mean not only "missing padding" but also (believe it or not) "incorrect padding". Update: Any...

NPM & left-pad: Have We Forgotten How To Program?

In my opinion, if you cannot write a left-pad, is-positive-integer, or isArray function in 5 minutes flat (including the time you spend ...

The Lost Art of Structure Packing - Catb.org

This page is about a technique for reducing the memory footprint of programs in compiled languages with C-like structures - manually repacking these ......

Working with missing data — pandas 1.5.2 documentation

See the cookbook for some advanced strategies. Values considered “missing”#. As data comes in many shapes and forms, pandas aims to be flexible...

Analysis of Potential Bill Padding - State Bar of California

the report urged attorneys to record the hours spent on each case in order to ... upward adjustments, the increased time may be...