Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

The pytorch example question-answering/run_qa_beam_search.py do not work

See original GitHub issue

Environment info

transformers version: git+https://github.com/huggingface/transformers
Platform:
Python version: 3.8
PyTorch version (GPU?): 1.10.0
Using GPU in script?: yes

Who can help

@pvl @vanpelt @NielsRogge @sgugger

Models:

T5: gsarti/it5-base
encoder-decoder models (For example, BlenderBot, BART, Marian, Pegasus, T5, ByT5): gsarti/it5-base
Pytorch: 1.10.0

If the model isn’t in the list, ping @LysandreJik who will redirect you to the correct contributor.

HF projects:

datasets: squad-it, adapted from github squad-it

Examples:

maintained examples (not research project or legacy): question-answering/run_qa_beam_search.py

Information

Model I am using (Bert, XLNet …): T5

The problem arises when using:

the official example scripts: (give details below)
my own modified scripts: (give details below)

The tasks I am working on is:

an official GLUE/SQUaD task: (give the name)
my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

clone code: git clone https://gitlab.com/nicolalandro/qandatrain.git (in this repo I copy the official files for train and I fix the lib in requirements)
go into the code folder: cd qandatrain
install requirements: pip install -r requirements.txt
clone dataset: git clone https://huggingface.co/datasets/z-uo/squad-it
run the code:

python src/run_qa_beam_search.py \
  --model_name_or_path gsarti/it5-base \
  --tokenizer_name gsarti/it5-base \
  --dataset_name squad \
  --train_file "squad-it/SQuAD_it-train_processed.json" \
  --validation_file "squad-it/SQuAD_it-test_processed.json" \
  --do_train \
  --do_eval \
  --per_device_train_batch_size 3 \
  --learning_rate 3e-5 \
  --num_train_epochs 2 \
  --max_seq_length 384 \
  --doc_stride 128 \
  --output_dir it5-squad

you obtain the following error:

...
Traceback (most recent call last):
  File "src/run_qa_beam_search.py", line 696, in <module>
    main()
  File "src/run_qa_beam_search.py", line 454, in main
    train_dataset = train_dataset.map(
  File "/media/mint/Barracuda/Project/qandatrain/venv/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 2036, in map
    return self._map_single(
  File "/media/mint/Barracuda/Project/qandatrain/venv/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 503, in wrapper
    out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
  File "/media/mint/Barracuda/Project/qandatrain/venv/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 470, in wrapper
    out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
  File "/media/mint/Barracuda/Project/qandatrain/venv/lib/python3.8/site-packages/datasets/fingerprint.py", line 406, in wrapper
    out = func(self, *args, **kwargs)
  File "/media/mint/Barracuda/Project/qandatrain/venv/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 2404, in _map_single
    batch = apply_function_on_filtered_inputs(
  File "/media/mint/Barracuda/Project/qandatrain/venv/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 2291, in apply_function_on_filtered_inputs
    function(*fn_args, effective_indices, **fn_kwargs) if with_indices else function(*fn_args, **fn_kwargs)
  File "/media/mint/Barracuda/Project/qandatrain/venv/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 1991, in decorated
    result = f(decorated_item, *args, **kwargs)
  File "src/run_qa_beam_search.py", line 386, in prepare_train_features
    cls_index = input_ids.index(tokenizer.cls_token_id)
ValueError: 32005 is not in list

It seams an error on the tokenizer that do not find some token on the dictionary or into the sentences.

Expected behavior

Train the T5 model for question answering on squad-it and create the trained model files at output_dir

Issue Analytics

State:
Created 2 years ago
Comments:8 (4 by maintainers)

Top GitHub Comments

1reaction

nicolalandrocommented, Oct 29, 2021

Perfect with that param The train ended correctly thank you!

0reactions

karthikrangasaicommented, Oct 28, 2021

@nicolalandro I had the same error when writing tests for the script. You should use the --predict_with_generate flag.

Top Results From Across the Web

Learning PyTorch with Examples

This tutorial introduces the fundamental concepts of PyTorch through self-contained examples. At its core, PyTorch provides two main features:.

PyTorch

An open source machine learning framework that accelerates the path from research prototyping to production deployment.

Start Locally - PyTorch

To install PyTorch via pip, and do have a ROCm-capable system, in the above selector, choose OS: Linux, Package: Pip, Language: Python and...

PyTorch 2.0

TorchInductor is a deep learning compiler that generates fast code for multiple accelerators and backends. For NVIDIA GPUs, it uses OpenAI Triton as...

Writing Distributed Applications with PyTorch

For the purpose of this tutorial, we will use a single machine and spawn multiple processes using the following template. """run.py:""" #!/usr/bin/env python...