question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Can't use padding in Wav2Vec2Tokenizer. TypeError: '<' not supported between instances of 'NoneType' and 'int'.

See original GitHub issue

Questions & Help

Details

I’m trying to get a Tensor of labels from a text in order to train a Wav2Vec2ForCTC from scratch but apparently pad_token_id is set to NoneType, even though I’ve set a pad_token in my Tokenizer.

This is my code:

# Generating the Processor
from transformers import Wav2Vec2CTCTokenizer
from transformers import Wav2Vec2FeatureExtractor
from transformers import Wav2Vec2Processor

tokenizer         = Wav2Vec2CTCTokenizer("./vocab.json", unk_token = "[UNK]", pad_token = "[PAD]", word_delimiter_token="|")
feature_extractor = Wav2Vec2FeatureExtractor(feature_size=1, sampling_rate=sampling_rate, padding_value=0.0, do_normalize=True, return_attention_mask=False)
processor         = Wav2Vec2Processor(feature_extractor=feature_extractor, tokenizer=tokenizer)

with processor.as_target_processor():
         batch["labels"] = processor(batch["text"], padding = True, max_length = 1000, return_tensors="pt").input_ids

Error message is this:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-8-45831c0137f6> in <module>
      9 
     10 # Processing
---> 11 data = prepare(data)
     12 data["input"] = data["input"][0]
     13 data["input"] = np.array([inp.T.reshape(12*4096) for inp in data["input"]])

<ipython-input-4-aaba15f24a61> in prepare(batch)
     29     # Texts
     30     with processor.as_target_processor():
---> 31         batch["labels"] = processor(batch["text"], padding = True, max_length = 1000, return_tensors="pt").input_ids
     32 
     33     return batch

~/anaconda3/lib/python3.8/site-packages/transformers/models/wav2vec2/processing_wav2vec2.py in __call__(self, *args, **kwargs)
    115         the above two methods for more information.
    116         """
--> 117         return self.current_processor(*args, **kwargs)
    118 
    119     def pad(self, *args, **kwargs):

~/anaconda3/lib/python3.8/site-packages/transformers/tokenization_utils_base.py in __call__(self, text, text_pair, add_special_tokens, padding, truncation, max_length, stride, is_split_into_words, pad_to_multiple_of, return_tensors, return_token_type_ids, return_attention_mask, return_overflowing_tokens, return_special_tokens_mask, return_offsets_mapping, return_length, verbose, **kwargs)
   2252         if is_batched:
   2253             batch_text_or_text_pairs = list(zip(text, text_pair)) if text_pair is not None else text
-> 2254             return self.batch_encode_plus(
   2255                 batch_text_or_text_pairs=batch_text_or_text_pairs,
   2256                 add_special_tokens=add_special_tokens,

~/anaconda3/lib/python3.8/site-packages/transformers/tokenization_utils_base.py in batch_encode_plus(self, batch_text_or_text_pairs, add_special_tokens, padding, truncation, max_length, stride, is_split_into_words, pad_to_multiple_of, return_tensors, return_token_type_ids, return_attention_mask, return_overflowing_tokens, return_special_tokens_mask, return_offsets_mapping, return_length, verbose, **kwargs)
   2428 
   2429         # Backward compatibility for 'truncation_strategy', 'pad_to_max_length'
-> 2430         padding_strategy, truncation_strategy, max_length, kwargs = self._get_padding_truncation_strategies(
   2431             padding=padding,
   2432             truncation=truncation,

~/anaconda3/lib/python3.8/site-packages/transformers/tokenization_utils_base.py in _get_padding_truncation_strategies(self, padding, truncation, max_length, pad_to_multiple_of, verbose, **kwargs)
   2149 
   2150         # Test if we have a padding token
-> 2151         if padding_strategy != PaddingStrategy.DO_NOT_PAD and (not self.pad_token or self.pad_token_id < 0):
   2152             raise ValueError(
   2153                 "Asking to pad but the tokenizer does not have a padding token. "

TypeError: '<' not supported between instances of 'NoneType' and 'int'

I’ve also tried seting the pad_token with tokenizer.pad_token = “[PAD]”. It didn’t work. Does anyone know what I’m doing wrong? Thanks.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:2
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
7AM7commented, Nov 9, 2021

@patrickvonplaten @patrickvonplaten I have the same error here, any help?

0reactions
mehmetcalikuscommented, Jan 14, 2022

I am getting the same error when I am trying to use gpt2 tokenizer. I am trying to fine tune bert2gpt2 encoder decoder model with your training scripts here: https://huggingface.co/patrickvonplaten/bert2gpt2-cnn_dailymail-fp16

I tried transformers 4.15.0 and 4.6.0 both of them didn’t work.

Read more comments on GitHub >

github_iconTop Results From Across the Web

' not supported between instances of 'NoneType' and 'int' - ...
In Python 3 such comparisons raise a TypeError : ... TypeError: '>' not supported between instances of 'NoneType' and 'int' >>> None <...
Read more >
huggingface typeerror: '>' not supported between instances ...
Can't use padding in Wav2Vec2Tokenizer. TypeError: '<' not supported ... ... TypeError: '<' not supported between instances of 'NoneType' and 'int'. #12824.
Read more >
'>' not supported between instances of 'str' and 'int'
typeerror : '>' not supported between instances of 'str' and 'int'. Strings and integers cannot be compared using comparison operators. This is ...
Read more >
TypeError: '>' not supported between instances of ' ...
... exception=TypeError("'>' not supported between instances of 'NoneType' and 'int'")> > Traceback (most recent call last): > File ...
Read more >
'' not supported between instances of 'NoneType' and 'float ...
Pandas : TypeError : '' not supported between instances of ' NoneType ' and 'float' [ Beautify Your Computer ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found