question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. ItΒ collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

GPT2 IndexError: index out of range in functional.py by running run_clm.py when adding any special tokens (even eos and bos only)

See original GitHub issue

Hi all, I need your help as I’m stuck on an issue IndexError trying to finetune GPT2 using run_clm.py while adding special tokens. The error is trigger at this line of functional.py: return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)

run_clm.py has been β€œbarely” modified just adding the tokens with tokenizer.add_special_tokens See below details of the modification, the args used and the error log.

After weeks of preparing datasets, we hope to use your amazing scripts and library for an awesome AI project, I need your help please! πŸ‘

Environment info

  • transformers version: 4.5.0
  • Platform: Darwin-20.2.0-x86_64-i386-64bit
  • Python version: 3.7.9
  • PyTorch version (GPU?): 1.8.1 (False)
  • Tensorflow version (GPU?): not installed (NA)
  • Using GPU in script?: NO
  • Using distributed or parallel set-up in script?: NO

Also tried on Windows OS with CUDA 11.1 same transformers version, same Python version, etc = same issue.

Who can help

@patrickvonplaten, @LysandreJik, @sgugger

Information

Model I am using (Bert, XLNet …): GPT2 Medium

The problem arises when using:

  • the official example scripts: (give details below)

The tasks I am working on is:

  • my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

  1. Run transformers/examples/language-modeling/run_clm.py with the following args (see below). You can probably have the exact same issue using any dataset. It doesn’t look to be a dataset related issue as the training works without the special tokens added.
  2. The file run_clm.py has been modified slightly just to include eos token, bos token and additional special tokens (see below). The issue persists as long as I add any of these special token. The only solution seems to be to have no special token at all with this GPT2 fine-tuning code which is unfortunate because I need those for my purpose. πŸ˜ƒ

ARGS

python transformers/examples/language-modeling/run_clm.py \
--output_dir "models/output/" \
--model_type "gpt2" \
--model_name_or_path "models/original/" \
--tokenizer_name "gpt2" \
--cache_dir "models/cache/" \
--no_use_fast_tokenizer \
--do_train True \
--train_file "models/datasets/dataset-training-05042021.txt" \
--do_eval True \
--validation_file "models/datasets/dataset-validation-05042021.txt" \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 1 \
--save_steps 500 \
--num_train_epochs 5 \
--learning_rate 5e-5 \
--weight_decay 0 \
--adam_beta1 0.9 \
--adam_beta2 0.999 \
--adam_epsilon 1e-8 \
--max_grad_norm 1.0 \
--no_cuda True \
--seed 123456 \
--fp16 False \
--fp16_opt_level "O1" \
--fp16_backend "auto" \
--fp16_full_eval False \

CODE MODIFICATION

I added this code on line 308 of run_clm.py just before the model.resize_token_embeddings(len(tokenizer)):

    special_tokens_dict = {
        'bos_token': '<|startoftext|>',
        'eos_token': '<|endoftext|>',
        'additional_special_tokens': [
             "<A>",
             "<B>",
             "<C>",
             "<D>",
             "<E>",
             "<F>",
             "<G>",
             "<H>"
         ]
    }
    num_added_toks = tokenizer.add_special_tokens(special_tokens_dict)

ISSUE LOGS

04/06/2021 17:48:36 - WARNING - __main__ -   Process rank: -1, device: cpu, n_gpu: 0distributed training: False, 16-bits training: False
04/06/2021 17:48:36 - INFO - __main__ -   Training/evaluation parameters TrainingArguments(output_dir=models/output/, overwrite_output_dir=False, do_train=True, do_eval=True, do_predict=False, evaluation_strategy=IntervalStrategy.NO, prediction_loss_only=False, per_device_train_batch_size=1, per_device_eval_batch_size=1, gradient_accumulation_steps=1, eval_accumulation_steps=None, learning_rate=5e-05, weight_decay=0.0, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=5.0, max_steps=-1, lr_scheduler_type=SchedulerType.LINEAR, warmup_ratio=0.0, warmup_steps=0, logging_dir=runs/Apr06_17-48-36_BLABLABLA-MacBook-Air.local, logging_strategy=IntervalStrategy.STEPS, logging_first_step=False, logging_steps=500, save_strategy=IntervalStrategy.STEPS, save_steps=500, save_total_limit=None, no_cuda=True, seed=261184, fp16=False, fp16_opt_level=O1, fp16_backend=auto, fp16_full_eval=False, local_rank=-1, tpu_num_cores=None, tpu_metrics_debug=False, debug=False, dataloader_drop_last=False, eval_steps=500, dataloader_num_workers=0, past_index=-1, run_name=models/output/, disable_tqdm=False, remove_unused_columns=True, label_names=None, load_best_model_at_end=False, metric_for_best_model=None, greater_is_better=None, ignore_data_skip=False, sharded_ddp=[], deepspeed=None, label_smoothing_factor=0.0, adafactor=False, group_by_length=False, length_column_name=length, report_to=['tensorboard'], ddp_find_unused_parameters=None, dataloader_pin_memory=True, skip_memory_metrics=False, _n_gpu=0, mp_parameters=)
04/06/2021 17:48:36 - WARNING - datasets.builder -   Using custom data configuration default-544362d6d13a5db7
04/06/2021 17:48:36 - WARNING - datasets.builder -   Reusing dataset text (/Users/blablabla/.cache/huggingface/datasets/text/default-544362d6d13a5db7/0.0.0/e16f44aa1b321ece1f87b07977cc5d70be93d69b20486d6dacd62e12cf25c9a5)
[INFO|configuration_utils.py:488] 2021-04-06 17:48:36,800 >> loading configuration file models/original/config.json
[INFO|configuration_utils.py:526] 2021-04-06 17:48:36,802 >> Model config GPT2Config {
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "gradient_checkpointing": false,
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 768,
  "n_head": 12,
  "n_inner": null,
  "n_layer": 12,
  "n_positions": 1024,
  "resid_pdrop": 0.1,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 50
    }
  },
  "transformers_version": "4.5.0",
  "use_cache": true,
  "vocab_size": 50257
}

[INFO|configuration_utils.py:490] 2021-04-06 17:48:37,245 >> loading configuration file https://huggingface.co/gpt2/resolve/main/config.json from cache at models/cache/fc674cd6907b4c9e933cb42d67662436b89fa9540a1f40d7c919d0109289ad01.7d2e0efa5ca20cef4fb199382111e9d3ad96fd77b849e1d4bed13a66e1336f51
[INFO|configuration_utils.py:526] 2021-04-06 17:48:37,247 >> Model config GPT2Config {
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "gradient_checkpointing": false,
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 768,
  "n_head": 12,
  "n_inner": null,
  "n_layer": 12,
  "n_positions": 1024,
  "resid_pdrop": 0.1,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 50
    }
  },
  "transformers_version": "4.5.0",
  "use_cache": true,
  "vocab_size": 50257
}

[INFO|tokenization_utils_base.py:1707] 2021-04-06 17:48:39,085 >> loading file https://huggingface.co/gpt2/resolve/main/vocab.json from cache at models/cache/684fe667923972fb57f6b4dcb61a3c92763ad89882f3da5da9866baf14f2d60f.c7ed1f96aac49e745788faa77ba0a26a392643a50bb388b9c04ff469e555241f
[INFO|tokenization_utils_base.py:1707] 2021-04-06 17:48:39,085 >> loading file https://huggingface.co/gpt2/resolve/main/merges.txt from cache at models/cache/c0c761a63004025aeadd530c4c27b860ec4ecbe8a00531233de21d865a402598.5d12962c5ee615a4c803841266e9c3be9a691a924f72d395d3a6c6c81157788b
[INFO|tokenization_utils_base.py:1707] 2021-04-06 17:48:39,086 >> loading file https://huggingface.co/gpt2/resolve/main/added_tokens.json from cache at None
[INFO|tokenization_utils_base.py:1707] 2021-04-06 17:48:39,086 >> loading file https://huggingface.co/gpt2/resolve/main/special_tokens_map.json from cache at None
[INFO|tokenization_utils_base.py:1707] 2021-04-06 17:48:39,086 >> loading file https://huggingface.co/gpt2/resolve/main/tokenizer_config.json from cache at None
[INFO|tokenization_utils_base.py:1707] 2021-04-06 17:48:39,086 >> loading file https://huggingface.co/gpt2/resolve/main/tokenizer.json from cache at models/cache/16a2f78023c8dc511294f0c97b5e10fde3ef9889ad6d11ffaa2a00714e73926e.cf2d0ecb83b6df91b3dbb53f1d1e4c311578bfd3aa0e04934215a49bf9898df0
[INFO|modeling_utils.py:1050] 2021-04-06 17:48:39,223 >> loading weights file models/original/pytorch_model.bin
[INFO|modeling_utils.py:1168] 2021-04-06 17:48:45,948 >> All model checkpoint weights were used when initializing GPT2LMHeadModel.

[INFO|modeling_utils.py:1177] 2021-04-06 17:48:45,949 >> All the weights of GPT2LMHeadModel were initialized from the model checkpoint at models/original/.
If your task is similar to the task the model of the checkpoint was trained on, you can already use GPT2LMHeadModel for predictions without further training.
[INFO|tokenization_utils_base.py:873] 2021-04-06 17:48:45,949 >> Assigning <|startoftext|> to the bos_token key of the tokenizer
[INFO|tokenization_utils.py:207] 2021-04-06 17:48:45,950 >> Adding <|startoftext|> to the vocabulary
[INFO|tokenization_utils_base.py:873] 2021-04-06 17:48:45,950 >> Assigning <|endoftext|> to the eos_token key of the tokenizer
[INFO|tokenization_utils_base.py:873] 2021-04-06 17:48:45,950 >> Assigning ['<A>', '<B>', '<C>', '<D>', '<E>', '<F>', '<G>', '<H>'] to the additional_special_tokens key of the tokenizer
[INFO|tokenization_utils.py:207] 2021-04-06 17:48:45,950 >> Adding <A> to the vocabulary
[INFO|tokenization_utils.py:207] 2021-04-06 17:48:45,950 >> Adding <B> to the vocabulary
[INFO|tokenization_utils.py:207] 2021-04-06 17:48:45,950 >> Adding <C> to the vocabulary
[INFO|tokenization_utils.py:207] 2021-04-06 17:48:45,950 >> Adding <D> to the vocabulary
[INFO|tokenization_utils.py:207] 2021-04-06 17:48:45,950 >> Adding <E> to the vocabulary
[INFO|tokenization_utils.py:207] 2021-04-06 17:48:45,950 >> Adding <F> to the vocabulary
[INFO|tokenization_utils.py:207] 2021-04-06 17:48:45,950 >> Adding <G> to the vocabulary
[INFO|tokenization_utils.py:207] 2021-04-06 17:48:45,950 >> Adding <H> to the vocabulary
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 199/199 [01:15<00:00,  2.62ba/s]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 10/10 [00:03<00:00,  2.69ba/s]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 199/199 [01:02<00:00,  3.17ba/s]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 10/10 [00:02<00:00,  3.39ba/s]
[INFO|trainer.py:921] 2021-04-06 17:51:21,859 >> Loading model from models/original/).
[INFO|configuration_utils.py:488] 2021-04-06 17:51:21,924 >> loading configuration file models/original/config.json
[INFO|configuration_utils.py:526] 2021-04-06 17:51:21,931 >> Model config GPT2Config {
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "gradient_checkpointing": false,
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 768,
  "n_head": 12,
  "n_inner": null,
  "n_layer": 12,
  "n_positions": 1024,
  "resid_pdrop": 0.1,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 50
    }
  },
  "transformers_version": "4.5.0",
  "use_cache": true,
  "vocab_size": 50257
}

[INFO|modeling_utils.py:1050] 2021-04-06 17:51:21,950 >> loading weights file models/original/pytorch_model.bin
[INFO|modeling_utils.py:1168] 2021-04-06 17:51:31,409 >> All model checkpoint weights were used when initializing GPT2LMHeadModel.

[INFO|modeling_utils.py:1177] 2021-04-06 17:51:31,409 >> All the weights of GPT2LMHeadModel were initialized from the model checkpoint at models/original/.
If your task is similar to the task the model of the checkpoint was trained on, you can already use GPT2LMHeadModel for predictions without further training.
[INFO|trainer.py:1013] 2021-04-06 17:51:31,478 >> ***** Running training *****
[INFO|trainer.py:1014] 2021-04-06 17:51:31,483 >>   Num examples = 8199
[INFO|trainer.py:1015] 2021-04-06 17:51:31,489 >>   Num Epochs = 5
[INFO|trainer.py:1016] 2021-04-06 17:51:31,489 >>   Instantaneous batch size per device = 1
[INFO|trainer.py:1017] 2021-04-06 17:51:31,489 >>   Total train batch size (w. parallel, distributed & accumulation) = 1
[INFO|trainer.py:1018] 2021-04-06 17:51:31,489 >>   Gradient Accumulation steps = 1
[INFO|trainer.py:1019] 2021-04-06 17:51:31,489 >>   Total optimization steps = 40995
  0%|                                                                                                                                                                              | 0/40995 [00:00<?, ?it/s]Traceback (most recent call last):
  File "transformers/examples/language-modeling/run_clm.py", line 459, in <module>
    main()
  File "transformers/examples/language-modeling/run_clm.py", line 424, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/Users/blablabla/Developer/Training/env/lib/python3.7/site-packages/transformers/trainer.py", line 1120, in train
    tr_loss += self.training_step(model, inputs)
  File "/Users/blablabla/Developer/Training/env/lib/python3.7/site-packages/transformers/trainer.py", line 1524, in training_step
    loss = self.compute_loss(model, inputs)
  File "/Users/blablabla/Developer/Training/env/lib/python3.7/site-packages/transformers/trainer.py", line 1556, in compute_loss
    outputs = model(**inputs)
  File "/Users/blablabla/Developer/Training/env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/blablabla/Developer/Training/env/lib/python3.7/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 917, in forward
    return_dict=return_dict,
  File "/Users/blablabla/Developer/Training/env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/blablabla/Developer/Training/env/lib/python3.7/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 694, in forward
    inputs_embeds = self.wte(input_ids)
  File "/Users/blablabla/Developer/Training/env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/blablabla/Developer/Training/env/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 158, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "/Users/blablabla/Developer/Training/env/lib/python3.7/site-packages/torch/nn/functional.py", line 1921, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self
  0%|                                                                                                                                                                              | 0/40995 [00:00<?, ?it/s]

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
MoonshotQuestcommented, Apr 8, 2021

Great thank you so much! @sgugger @LysandreJik That makes sense now, I removed the line and it works perfectly. πŸ‘

I will let you know when we get closer to a launch date for our AI based game. It’s going to be awesome! Sorry to troll this thread but does Huggingface has a place to showcase apps made using your incredible libraries? 😊

1reaction
sguggercommented, Apr 8, 2021

Ah, this is because your checkpoint should have the resized weights: it’s resized inside the script but since it’s a local folder, it’s also passed as a checkpoint to the Trainer later in the script, which then reloads the model from that folder without the model.resize_token_embeddings(len(tokenizer)) this time. So you have two solutions:

  • either load your model, apply model.resize_token_embeddings(len(tokenizer)) then resave it.
  • or remove the line that interprets the folder as a checkpoint here
Read more comments on GitHub >

github_iconTop Results From Across the Web

Finetuning GPT2 produces IndexError: index out of range in ...
Question. I am finetuning the pretrained GPT2 model on my dataset, using a custom defined loss function. I am getting this error below,...
Read more >
IndexError: index out of range in self - Text Generation with ...
Hi! I have recently started experimenting with the transformers library. In this small project I would like to fine-tune a GPT2 model toΒ ......
Read more >
while running huggingface gpt2-xl model embedding index ...
Then after printing the config , I can see that the previous line took effect in the configuration. But still, I am getting...
Read more >
while running huggingface gpt2-xl model embedding index ...
I ran code from the quickstart page that load the small gpt2 model and ... RuntimeError: index out of range: Tried to access...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found