The pytorch example question-answering/run_qa_beam_search.py do not work
See original GitHub issueEnvironment info
transformers
version: git+https://github.com/huggingface/transformers- Platform:
- Python version: 3.8
- PyTorch version (GPU?): 1.10.0
- Using GPU in script?: yes
Who can help
@pvl @vanpelt @NielsRogge @sgugger
Models:
- T5: gsarti/it5-base
- encoder-decoder models (For example, BlenderBot, BART, Marian, Pegasus, T5, ByT5): gsarti/it5-base
- Pytorch: 1.10.0
If the model isn’t in the list, ping @LysandreJik who will redirect you to the correct contributor.
HF projects:
- datasets: squad-it, adapted from github squad-it
Examples:
- maintained examples (not research project or legacy): question-answering/run_qa_beam_search.py
Information
Model I am using (Bert, XLNet …): T5
The problem arises when using:
- the official example scripts: (give details below)
- my own modified scripts: (give details below)
The tasks I am working on is:
- an official GLUE/SQUaD task: (give the name)
- my own task or dataset: (give details below)
To reproduce
Steps to reproduce the behavior:
- clone code:
git clone https://gitlab.com/nicolalandro/qandatrain.git
(in this repo I copy the official files for train and I fix the lib in requirements) - go into the code folder:
cd qandatrain
- install requirements:
pip install -r requirements.txt
- clone dataset:
git clone https://huggingface.co/datasets/z-uo/squad-it
- run the code:
python src/run_qa_beam_search.py \
--model_name_or_path gsarti/it5-base \
--tokenizer_name gsarti/it5-base \
--dataset_name squad \
--train_file "squad-it/SQuAD_it-train_processed.json" \
--validation_file "squad-it/SQuAD_it-test_processed.json" \
--do_train \
--do_eval \
--per_device_train_batch_size 3 \
--learning_rate 3e-5 \
--num_train_epochs 2 \
--max_seq_length 384 \
--doc_stride 128 \
--output_dir it5-squad
- you obtain the following error:
...
Traceback (most recent call last):
File "src/run_qa_beam_search.py", line 696, in <module>
main()
File "src/run_qa_beam_search.py", line 454, in main
train_dataset = train_dataset.map(
File "/media/mint/Barracuda/Project/qandatrain/venv/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 2036, in map
return self._map_single(
File "/media/mint/Barracuda/Project/qandatrain/venv/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 503, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/media/mint/Barracuda/Project/qandatrain/venv/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 470, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/media/mint/Barracuda/Project/qandatrain/venv/lib/python3.8/site-packages/datasets/fingerprint.py", line 406, in wrapper
out = func(self, *args, **kwargs)
File "/media/mint/Barracuda/Project/qandatrain/venv/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 2404, in _map_single
batch = apply_function_on_filtered_inputs(
File "/media/mint/Barracuda/Project/qandatrain/venv/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 2291, in apply_function_on_filtered_inputs
function(*fn_args, effective_indices, **fn_kwargs) if with_indices else function(*fn_args, **fn_kwargs)
File "/media/mint/Barracuda/Project/qandatrain/venv/lib/python3.8/site-packages/datasets/arrow_dataset.py", line 1991, in decorated
result = f(decorated_item, *args, **kwargs)
File "src/run_qa_beam_search.py", line 386, in prepare_train_features
cls_index = input_ids.index(tokenizer.cls_token_id)
ValueError: 32005 is not in list
It seams an error on the tokenizer that do not find some token on the dictionary or into the sentences.
Expected behavior
Train the T5 model for question answering on squad-it and create the trained model files at output_dir
Issue Analytics
- State:
- Created 2 years ago
- Comments:8 (4 by maintainers)
Top Results From Across the Web
Learning PyTorch with Examples
This tutorial introduces the fundamental concepts of PyTorch through self-contained examples. At its core, PyTorch provides two main features:.
Read more >PyTorch
An open source machine learning framework that accelerates the path from research prototyping to production deployment.
Read more >Start Locally - PyTorch
To install PyTorch via pip, and do have a ROCm-capable system, in the above selector, choose OS: Linux, Package: Pip, Language: Python and...
Read more >PyTorch 2.0
TorchInductor is a deep learning compiler that generates fast code for multiple accelerators and backends. For NVIDIA GPUs, it uses OpenAI Triton as...
Read more >Writing Distributed Applications with PyTorch
For the purpose of this tutorial, we will use a single machine and spawn multiple processes using the following template. """run.py:""" #!/usr/bin/env python...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Perfect with that param The train ended correctly thank you!
@nicolalandro I had the same error when writing tests for the script. You should use the
--predict_with_generate
flag.