Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BertTokenizer: ValueError: Input is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers.

See original GitHub issue

🐛 Bug

Information

Tokenizer I am using is BertTokenizer and I’ve also tried using AlbertTokenizer, but it does not have any effect. So I’m thinking that the bug is in the base tokenizer

Language I am using the model on is English, but I don’t believe that’s the issue.

The problem arises when using:

the official example scripts: (give details below)
my own modified scripts: (give details below)

The tasks I am working on is:

an official GLUE/SQUaD task: (give the name)
my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

Version: transformers==2.11.0
Run this code

from transformers import BertModel, BertTokenizer
text = 'A quick brown fox jumps over' # Just a dummy text
BertTokenizer.encode_plus(
    text.split(' '),
    None,
    add_special_tokens = True,
    max_length = 512)

This should be the error

Traceback (most recent call last):
  File "classification.py", line 23, in <module>
    max_length = 512)
  File "D:\Programmering\Python\lib\site-packages\transformers\tokenization_utils.py", line 1576, in encode_plus
    first_ids = get_input_ids(text)
  File "D:\Programmering\Python\lib\site-packages\transformers\tokenization_utils.py", line 1556, in get_input_ids
    "Input is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers."
ValueError: Input is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers.

And yes, I’ve tried just inputting a string, and I still got the same error.

Expected behavior

I want the encoder_plus function to return an encoded version of the input sequence.

Environment info

transformers version: 2.11.0
Platform: Windows
Python version: 3.7.4
PyTorch version (GPU?): 1.5.0+cpu
Tensorflow version (GPU?): (Not used)
Using GPU in script?: Nope
Using distributed or parallel set-up in script?: No

Issue Analytics

State:
Created 3 years ago
Comments:14 (2 by maintainers)

Top GitHub Comments

2reactions

SarangSanjayGujar-lillycommented, Jun 25, 2020

I’m still facing the same issue: ValueError: Input [] is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers. While trying to run run_squad.py. I’m trying to train and test it with: https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json

1reaction

LysandreJikcommented, Jun 18, 2020

Hi @mariusjohan, we welcome all models here 😃 The hub is a very easy way to share models. The way you’re training it will surely be different to other trainings, so sharing it on the hub with details of how you trained it is always welcome!