Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Can't use padding in Wav2Vec2Tokenizer. TypeError: '<' not supported between instances of 'NoneType' and 'int'.

See original GitHub issue

Questions & Help

Details

I’m trying to get a Tensor of labels from a text in order to train a Wav2Vec2ForCTC from scratch but apparently pad_token_id is set to NoneType, even though I’ve set a pad_token in my Tokenizer.

This is my code:

# Generating the Processor
from transformers import Wav2Vec2CTCTokenizer
from transformers import Wav2Vec2FeatureExtractor
from transformers import Wav2Vec2Processor

tokenizer         = Wav2Vec2CTCTokenizer("./vocab.json", unk_token = "[UNK]", pad_token = "[PAD]", word_delimiter_token="|")
feature_extractor = Wav2Vec2FeatureExtractor(feature_size=1, sampling_rate=sampling_rate, padding_value=0.0, do_normalize=True, return_attention_mask=False)
processor         = Wav2Vec2Processor(feature_extractor=feature_extractor, tokenizer=tokenizer)

with processor.as_target_processor():
         batch["labels"] = processor(batch["text"], padding = True, max_length = 1000, return_tensors="pt").input_ids

Error message is this:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-8-45831c0137f6> in <module>
      9 
     10 # Processing
---> 11 data = prepare(data)
     12 data["input"] = data["input"][0]
     13 data["input"] = np.array([inp.T.reshape(12*4096) for inp in data["input"]])

<ipython-input-4-aaba15f24a61> in prepare(batch)
     29     # Texts
     30     with processor.as_target_processor():
---> 31         batch["labels"] = processor(batch["text"], padding = True, max_length = 1000, return_tensors="pt").input_ids
     32 
     33     return batch

~/anaconda3/lib/python3.8/site-packages/transformers/models/wav2vec2/processing_wav2vec2.py in __call__(self, *args, **kwargs)
    115         the above two methods for more information.
    116         """
--> 117         return self.current_processor(*args, **kwargs)
    118 
    119     def pad(self, *args, **kwargs):

~/anaconda3/lib/python3.8/site-packages/transformers/tokenization_utils_base.py in __call__(self, text, text_pair, add_special_tokens, padding, truncation, max_length, stride, is_split_into_words, pad_to_multiple_of, return_tensors, return_token_type_ids, return_attention_mask, return_overflowing_tokens, return_special_tokens_mask, return_offsets_mapping, return_length, verbose, **kwargs)
   2252         if is_batched:
   2253             batch_text_or_text_pairs = list(zip(text, text_pair)) if text_pair is not None else text
-> 2254             return self.batch_encode_plus(
   2255                 batch_text_or_text_pairs=batch_text_or_text_pairs,
   2256                 add_special_tokens=add_special_tokens,

~/anaconda3/lib/python3.8/site-packages/transformers/tokenization_utils_base.py in batch_encode_plus(self, batch_text_or_text_pairs, add_special_tokens, padding, truncation, max_length, stride, is_split_into_words, pad_to_multiple_of, return_tensors, return_token_type_ids, return_attention_mask, return_overflowing_tokens, return_special_tokens_mask, return_offsets_mapping, return_length, verbose, **kwargs)
   2428 
   2429         # Backward compatibility for 'truncation_strategy', 'pad_to_max_length'
-> 2430         padding_strategy, truncation_strategy, max_length, kwargs = self._get_padding_truncation_strategies(
   2431             padding=padding,
   2432             truncation=truncation,

~/anaconda3/lib/python3.8/site-packages/transformers/tokenization_utils_base.py in _get_padding_truncation_strategies(self, padding, truncation, max_length, pad_to_multiple_of, verbose, **kwargs)
   2149 
   2150         # Test if we have a padding token
-> 2151         if padding_strategy != PaddingStrategy.DO_NOT_PAD and (not self.pad_token or self.pad_token_id < 0):
   2152             raise ValueError(
   2153                 "Asking to pad but the tokenizer does not have a padding token. "

TypeError: '<' not supported between instances of 'NoneType' and 'int'

I’ve also tried seting the pad_token with tokenizer.pad_token = “[PAD]”. It didn’t work. Does anyone know what I’m doing wrong? Thanks.

Issue Analytics

State:
Created 2 years ago
Reactions:2
Comments:6 (2 by maintainers)

Top GitHub Comments

1reaction

7AM7commented, Nov 9, 2021

@patrickvonplaten @patrickvonplaten I have the same error here, any help?

0reactions

mehmetcalikuscommented, Jan 14, 2022

I am getting the same error when I am trying to use gpt2 tokenizer. I am trying to fine tune bert2gpt2 encoder decoder model with your training scripts here: https://huggingface.co/patrickvonplaten/bert2gpt2-cnn_dailymail-fp16

I tried transformers 4.15.0 and 4.6.0 both of them didn’t work.

Top Results From Across the Web

' not supported between instances of 'NoneType' and 'int' - ...

In Python 3 such comparisons raise a TypeError : ... TypeError: '>' not supported between instances of 'NoneType' and 'int' >>> None <...

huggingface typeerror: '>' not supported between instances ...

Can't use padding in Wav2Vec2Tokenizer. TypeError: '<' not supported ... ... TypeError: '<' not supported between instances of 'NoneType' and 'int'. #12824.

'>' not supported between instances of 'str' and 'int'

typeerror : '>' not supported between instances of 'str' and 'int'. Strings and integers cannot be compared using comparison operators. This is ...

TypeError: '>' not supported between instances of ' ...

... exception=TypeError("'>' not supported between instances of 'NoneType' and 'int'")> > Traceback (most recent call last): > File ...

'' not supported between instances of 'NoneType' and 'float ...

Pandas : TypeError : '' not supported between instances of ' NoneType ' and 'float' [ Beautify Your Computer ...