BertTokenizer: ValueError: Input is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers.
See original GitHub issue🐛 Bug
Information
Tokenizer I am using is BertTokenizer and I’ve also tried using AlbertTokenizer, but it does not have any effect. So I’m thinking that the bug is in the base tokenizer
Language I am using the model on is English, but I don’t believe that’s the issue.
The problem arises when using:
- the official example scripts: (give details below)
- my own modified scripts: (give details below)
The tasks I am working on is:
- an official GLUE/SQUaD task: (give the name)
- my own task or dataset: (give details below)
To reproduce
Steps to reproduce the behavior:
- Version:
transformers==2.11.0
- Run this code
from transformers import BertModel, BertTokenizer
text = 'A quick brown fox jumps over' # Just a dummy text
BertTokenizer.encode_plus(
text.split(' '),
None,
add_special_tokens = True,
max_length = 512)
- This should be the error
Traceback (most recent call last):
File "classification.py", line 23, in <module>
max_length = 512)
File "D:\Programmering\Python\lib\site-packages\transformers\tokenization_utils.py", line 1576, in encode_plus
first_ids = get_input_ids(text)
File "D:\Programmering\Python\lib\site-packages\transformers\tokenization_utils.py", line 1556, in get_input_ids
"Input is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers."
ValueError: Input is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers.
And yes, I’ve tried just inputting a string, and I still got the same error.
Expected behavior
I want the encoder_plus function to return an encoded version of the input sequence.
Environment info
transformers
version: 2.11.0- Platform: Windows
- Python version: 3.7.4
- PyTorch version (GPU?): 1.5.0+cpu
- Tensorflow version (GPU?): (Not used)
- Using GPU in script?: Nope
- Using distributed or parallel set-up in script?: No
Issue Analytics
- State:
- Created 3 years ago
- Comments:14 (2 by maintainers)
Top Results From Across the Web
Bert Tokenizing error ValueError: Input nan is not valid ...
it gives me the error: "ValueError: Input nan is not valid. Should be a string, a list/tuple of strings or a list/tuple of...
Read more >[Solved] BertTokenizer error ValueError: Input nan is not valid ...
[Solved] BertTokenizer error ValueError: Input nan is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers ......
Read more >Source code for transformers.tokenization_utils - Hugging Face
else: raise ValueError( f"Input {text} is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers.
Read more >3. Strings, lists, and tuples — Beginning Python Programming ...
index(value, [start, [stop]]) -> integer -- return first index of value. Raises ValueError if the value is not present. We will explore these...
Read more >India Immunization BERTish | Kaggle
I got ValueError: Input 0.8679001331 is not valid. Should be a string, a list/tuple of strings or a list/tuple of¶. integers. In Input...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I’m still facing the same issue: ValueError: Input [] is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers. While trying to run run_squad.py. I’m trying to train and test it with: https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json
Hi @mariusjohan, we welcome all models here 😃 The hub is a very easy way to share models. The way you’re training it will surely be different to other trainings, so sharing it on the hub with details of how you trained it is always welcome!