Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

generic text classification with TensorFlow error (AttributeError: 'TFTrainingArguments' object has no attribute 'args')

See original GitHub issue

Environment info

transformers version: 3.2.0
Platform: Linux-4.15.0-1091-oem-x86_64-with-Ubuntu-18.04-bionic
Python version: 3.6.9
PyTorch version (GPU?): not installed (NA)
Tensorflow version (GPU?): 2.3.0 (True)
Using GPU in script?: <fill in>
Using distributed or parallel set-up in script?: <fill in>

Who can help

@jplu

Information

Model I am using (Bert, XLNet …): bert-base-multilingual-uncased

The problem arises when using:

the official example scripts: (give details below)
my own modified scripts: (give details below) Running run_tf_text_classification.py with flags from the example in the “Run generic text classification script in TensorFlow” section of examples/text-classification

The tasks I am working on is:

an official GLUE/SQUaD task: (give the name)
my own task or dataset: (give details below) Text classification dataset for classifying answers to questions. Using 3 CSVs (train, dev, and test) that each have headers (class, text) and columns containing class labels (int) and questions (strings). There are no commas present in the questions, for reference.

To reproduce

Steps to reproduce the behavior:

Call run_tf_text_classification.py with flags from the example in the “Run generic text classification script in TensorFlow” section of examples/text-classification:

python run_tf_text_classification.py \
  --train_file train.csv \
  --dev_file dev.csv \ 
  --test_file test.csv \ 
  --label_column_id 0 \ 
  --model_name_or_path bert-base-multilingual-uncased \
  --output_dir model \
  --num_train_epochs 4 \
  --per_device_train_batch_size 16 \
  --per_device_eval_batch_size 32 \
  --do_train \
  --do_eval \
  --do_predict \
  --logging_steps 10 \
  --evaluate_during_training \
  --save_steps 10 \
  --overwrite_output_dir \
  --max_seq_length 128

Error is encountered:

Traceback (most recent call last):
  File "run_tf_text_classification.py", line 283, in <module>
    main()
  File "run_tf_text_classification.py", line 199, in main
    training_args.n_replicas,
  File "/home/qd_team/qdmr_gpu/smart_env/lib/python3.6/site-packages/transformers/file_utils.py", line 936, in wrapper
    return func(*args, **kwargs)
  File "/home/qd_team/qdmr_gpu/smart_env/lib/python3.6/site-packages/transformers/training_args_tf.py", line 180, in n_replicas
    return self._setup_strategy.num_replicas_in_sync
  File "/home/qd_team/qdmr_gpu/smart_env/lib/python3.6/site-packages/transformers/file_utils.py", line 914, in __get__
    cached = self.fget(obj)
  File "/home/qd_team/qdmr_gpu/smart_env/lib/python3.6/site-packages/transformers/file_utils.py", line 936, in wrapper
    return func(*args, **kwargs)
  File "/home/qd_team/qdmr_gpu/smart_env/lib/python3.6/site-packages/transformers/training_args_tf.py", line 122, in _setup_strategy
    if self.args.xla:
AttributeError: 'TFTrainingArguments' object has no attribute 'args'

If the logger.info call is commented out (lines 197-202), the above error is prevented but another error is encountered:

Traceback (most recent call last):
  File "run_tf_text_classification.py", line 282, in <module>
    main()
  File "run_tf_text_classification.py", line 221, in main
    max_seq_length=data_args.max_seq_length,
  File "run_tf_text_classification.py", line 42, in get_tfds
    ds = datasets.load_dataset("csv", data_files=files)
  File "/home/qd_team/qdmr_gpu/smart_env/lib/python3.6/site-packages/datasets/load.py", line 604, in load_dataset
    **config_kwargs,
  File "/home/qd_team/qdmr_gpu/smart_env/lib/python3.6/site-packages/datasets/builder.py", line 158, in __init__
    **config_kwargs,
  File "/home/qd_team/qdmr_gpu/smart_env/lib/python3.6/site-packages/datasets/builder.py", line 269, in _create_builder_config
    for key in sorted(data_files.keys()):
TypeError: '<' not supported between instances of 'NamedSplit' and 'NamedSplit'

Here is a pip freeze:

absl-py==0.10.0
astunparse==1.6.3
cachetools==4.1.1
certifi==2020.6.20
chardet==3.0.4
click==7.1.2
dataclasses==0.7
datasets==1.0.2
dill==0.3.2
filelock==3.0.12
gast==0.3.3
google-auth==1.21.3
google-auth-oauthlib==0.4.1
google-pasta==0.2.0
grpcio==1.32.0
h5py==2.10.0
idna==2.10
importlib-metadata==2.0.0
joblib==0.16.0
Keras-Preprocessing==1.1.2
Markdown==3.2.2
numpy==1.18.5
oauthlib==3.1.0
opt-einsum==3.3.0
packaging==20.4
pandas==1.1.2
protobuf==3.13.0
pyarrow==1.0.1
pyasn1==0.4.8
pyasn1-modules==0.2.8
pyparsing==2.4.7
python-dateutil==2.8.1
pytz==2020.1
regex==2020.7.14
requests==2.24.0
requests-oauthlib==1.3.0
rsa==4.6
sacremoses==0.0.43
scipy==1.4.1
sentencepiece==0.1.91
six==1.15.0
tensorboard==2.3.0
tensorboard-plugin-wit==1.7.0
tensorflow==2.3.0
tensorflow-estimator==2.3.0
termcolor==1.1.0
tokenizers==0.8.1rc2
tqdm==4.49.0
transformers==3.2.0
urllib3==1.25.10
Werkzeug==1.0.1
wrapt==1.12.1
xxhash==2.0.0
zipp==3.2.0