question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BertTokenizer: ValueError: Input is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers.

See original GitHub issue

🐛 Bug

Information

Tokenizer I am using is BertTokenizer and I’ve also tried using AlbertTokenizer, but it does not have any effect. So I’m thinking that the bug is in the base tokenizer

Language I am using the model on is English, but I don’t believe that’s the issue.

The problem arises when using:

  • the official example scripts: (give details below)
  • my own modified scripts: (give details below)

The tasks I am working on is:

  • an official GLUE/SQUaD task: (give the name)
  • my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

  1. Version: transformers==2.11.0
  2. Run this code
from transformers import BertModel, BertTokenizer
text = 'A quick brown fox jumps over' # Just a dummy text
BertTokenizer.encode_plus(
    text.split(' '),
    None,
    add_special_tokens = True,
    max_length = 512)
  1. This should be the error
Traceback (most recent call last):
  File "classification.py", line 23, in <module>
    max_length = 512)
  File "D:\Programmering\Python\lib\site-packages\transformers\tokenization_utils.py", line 1576, in encode_plus
    first_ids = get_input_ids(text)
  File "D:\Programmering\Python\lib\site-packages\transformers\tokenization_utils.py", line 1556, in get_input_ids
    "Input is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers."
ValueError: Input is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers.

And yes, I’ve tried just inputting a string, and I still got the same error.

Expected behavior

I want the encoder_plus function to return an encoded version of the input sequence.

Environment info

  • transformers version: 2.11.0
  • Platform: Windows
  • Python version: 3.7.4
  • PyTorch version (GPU?): 1.5.0+cpu
  • Tensorflow version (GPU?): (Not used)
  • Using GPU in script?: Nope
  • Using distributed or parallel set-up in script?: No

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:14 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
SarangSanjayGujar-lillycommented, Jun 25, 2020

I’m still facing the same issue: ValueError: Input [] is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers. While trying to run run_squad.py. I’m trying to train and test it with: https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json

1reaction
LysandreJikcommented, Jun 18, 2020

Hi @mariusjohan, we welcome all models here 😃 The hub is a very easy way to share models. The way you’re training it will surely be different to other trainings, so sharing it on the hub with details of how you trained it is always welcome!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Bert Tokenizing error ValueError: Input nan is not valid ...
it gives me the error: "ValueError: Input nan is not valid. Should be a string, a list/tuple of strings or a list/tuple of...
Read more >
[Solved] BertTokenizer error ValueError: Input nan is not valid ...
[Solved] BertTokenizer error ValueError: Input nan is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers ......
Read more >
Source code for transformers.tokenization_utils - Hugging Face
else: raise ValueError( f"Input {text} is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers.
Read more >
3. Strings, lists, and tuples — Beginning Python Programming ...
index(value, [start, [stop]]) -> integer -- return first index of value. Raises ValueError if the value is not present. We will explore these...
Read more >
India Immunization BERTish | Kaggle
I got ValueError: Input 0.8679001331 is not valid. Should be a string, a list/tuple of strings or a list/tuple of¶. integers. In Input...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found