question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. ItΒ collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BlenderBot-Distil-400M training fails if the input or target length exceeds a certain threshold, even when truncation and padding is on

See original GitHub issue

System Info

transformers version: 4.20.1, 4.21.0 Platform: Linux Python version: 3.7.6 Huggingface_hub version: 0.8.1 PyTorch version (GPU?): 1.10.2 (Yes) Tensorflow version (GPU?): not installed (NA) Flax version (CPU?/GPU?/TPU?): not installed (NA) Jax version: not installed JaxLib version: not installed Using GPU in script?: Yes (2+ Tesla V100) Using distributed or parallel set-up in script?: No

Who can help?

@patil-suraj

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, …)
  • My own task or dataset (give details below)

Reproduction

Run the following script with python script_blenderbot_length.py

# The contents of script_blenderbot_length.py
# To make the code crash, set CRITICAL_NUMBER=64
# To make it pass, set CRITICAL_NUMBER=63
# The code fails if EITHER the input or the target is repeated 64+ times.

from __future__ import annotations
import functools
import typing as tp
import datasets
import transformers
from transformers import (
    DataCollatorForSeq2Seq,
    PreTrainedTokenizer,
    Seq2SeqTrainingArguments,
    Seq2SeqTrainer,
)


CRITICAL_NUMBER = 64


increment_en = [
    {"input": "One", "target": "Two"},
    {"input": "Three "*2, "target": "Four "*2},
    {"input": "Five "*4, "target": "Six "*4},
    {"input": "Seven "*8, "target": "Eight "*8},
    {"input": "Nine "*CRITICAL_NUMBER, "target": "Ten "*CRITICAL_NUMBER},
]
increment_en = increment_en * 100


def lod_to_dol(list_of_dicts: tp.List[tp.Dict[str, tp.Any]]) -> tp.Dict[str, list]:
    dict_of_lists = {
        key: [dct[key] for dct in list_of_dicts] for key in list_of_dicts[0]
    }
    return dict_of_lists


increment_en = lod_to_dol(increment_en)


def preprocess_function_(
    examples,
    tokenizer: PreTrainedTokenizer,
    max_input_length: int,
    max_target_length: int,
):
    inputs = examples["input"]
    targets = examples["target"]

    model_inputs = tokenizer(inputs, max_length=max_input_length, truncation=True)

    # Setup the tokenizer for targets
    with tokenizer.as_target_tokenizer():
        labels = tokenizer(targets, max_length=max_target_length, truncation=True)

    model_inputs["labels"] = labels["input_ids"]
    return model_inputs


def main():
    tokenizer = transformers.BlenderbotTokenizer.from_pretrained("facebook/blenderbot-400M-distill")
    model = transformers.BlenderbotForConditionalGeneration.from_pretrained("facebook/blenderbot-400M-distill")

    args = Seq2SeqTrainingArguments(
        "script_debug",
        per_device_train_batch_size=4,
        per_device_eval_batch_size=4,
        fp16=True,
        push_to_hub=False,
        max_steps=10000,
        logging_steps=5000,
        save_steps=5000
    )

    data_collator = DataCollatorForSeq2Seq(tokenizer, model=model, padding=True)

    dataset = datasets.DatasetDict(
        {
            "train": datasets.Dataset.from_dict(increment_en),
            "test": datasets.Dataset.from_dict(increment_en),
        }
    )

    preprocess_function = functools.partial(
        preprocess_function_,
        tokenizer=tokenizer,
        max_input_length=512,
        max_target_length=512
    )

    processed_ds = dataset.map(preprocess_function, batched=True)
    processed_ds.set_format(
        type="torch", columns=["input_ids", "attention_mask", "labels"]
    )

    trainer = Seq2SeqTrainer(
        model,
        args,
        train_dataset=processed_ds["train"],
        eval_dataset=processed_ds["test"],
        data_collator=data_collator,
        tokenizer=tokenizer,
    )
    trainer.train()


if __name__ == "__main__":
    main()

Running the code when CRITICAL_NUMBER is set to 64 or greater leads to the bizarre series of CUDA asserts:

<Similar messages appear above, which are omitted for brevity>
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [29,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [30,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [31,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [32,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [33,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [34,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [35,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [36,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [37,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [38,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [39,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [40,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [41,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [42,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [43,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [44,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [45,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [46,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [47,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [48,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [49,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [50,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [51,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [52,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [53,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [54,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [55,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [56,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [57,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [58,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [59,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [60,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [61,0,0] Assertion `srcIndex < srcSelectDimSi
ze` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [62,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [63,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
  0%|                              | 0/10000 [00:07<?, ?it/s]
root@bolt-imq45r3c3y-8dfzr73qqa:/mnt/task_runtime# python script_blenderbot_length.py 
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00,  5.30ba/s]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00,  5.72ba/s]
max_steps is given, it will override any value given in num_train_epochs
Using cuda_amp half precision backend
The following columns in the training set don't have a corresponding argument in `BlenderbotForConditionalGeneration.forward` and have been ignored: target, input. If target, input are not expected by `BlenderbotForConditionalGeneration.forward`,  you can safely ignore this message.
/miniconda/lib/python3.7/site-packages/transformers/optimization.py:310: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  FutureWarning,
***** Running training *****
  Num examples = 500
  Num Epochs = 313
  Instantaneous batch size per device = 4
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 1
  Total optimization steps = 10000
  0%|                              | 0/10000 [00:00<?, ?it/s]Traceback (most recent call last):
  File "script_blenderbot_length.py", line 101, in <module>
    main()
  File "script_blenderbot_length.py", line 97, in main
    trainer.train()
  File "/miniconda/lib/python3.7/site-packages/transformers/trainer.py", line 1502, in train
    ignore_keys_for_eval=ignore_keys_for_eval,
  File "/miniconda/lib/python3.7/site-packages/transformers/trainer.py", line 1740, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/miniconda/lib/python3.7/site-packages/transformers/trainer.py", line 2470, in training_step
    loss = self.compute_loss(model, inputs)
  File "/miniconda/lib/python3.7/site-packages/transformers/trainer.py", line 2502, in compute_loss
    outputs = model(**inputs)
  File "/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/miniconda/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 168, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/miniconda/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/miniconda/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
    output.reraise()
  File "/miniconda/lib/python3.7/site-packages/torch/_utils.py", line 434, in reraise
    raise exception
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/miniconda/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
    output = module(*input, **kwargs)
  File "/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/miniconda/lib/python3.7/site-packages/transformers/models/blenderbot/modeling_blenderbot.py", line 1340, in forward
    return_dict=return_dict,
  File "/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/miniconda/lib/python3.7/site-packages/transformers/models/blenderbot/modeling_blenderbot.py", line 1181, in forward
    return_dict=return_dict,
  File "/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/miniconda/lib/python3.7/site-packages/transformers/models/blenderbot/modeling_blenderbot.py", line 785, in forward
    output_attentions=output_attentions,
  File "/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/miniconda/lib/python3.7/site-packages/transformers/models/blenderbot/modeling_blenderbot.py", line 318, in forward
    output_attentions=output_attentions,
  File "/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/miniconda/lib/python3.7/site-packages/transformers/models/blenderbot/modeling_blenderbot.py", line 180, in forward
    query_states = self.q_proj(hidden_states) * self.scaling
  File "/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/miniconda/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 103, in forward
    return F.linear(input, self.weight, self.bias)
  File "/miniconda/lib/python3.7/site-packages/torch/nn/functional.py", line 1848, in linear
    return torch._C._nn.linear(input, weight, bias)
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`

Expected behavior

The training code should not crash, especially when there are far fewer tokens than the tokenization limit.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:11 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
shermansiucommented, Aug 3, 2022

Ah. So the issue is that in the BlenderbotConfig, max_position_embeddings is set to 128. The publicly available weights only have position embeddings with those dimensions, so either I’d have to train from scratch or reduce the max tokenizer length to 128.

0reactions
github-actions[bot]commented, Sep 2, 2022

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Padding and truncation - Hugging Face
Truncation works in the other direction by truncating long sequences. In most cases, padding your batch to the length of the longest sequence...
Read more >
How to make a Trainer pad inputs in a batch with huggingface ...
I've tried putting the padding and truncation parameters in the tokenizer, in the Trainer, and in the training_args. Nothing does. Any idea?
Read more >
Divide Hugging Face Transformers training time by 2 or more ...
Dynamic padding: we limit the number of added pad tokens to reach the length of the longest sequence of each mini batch instead...
Read more >
tokenization_utils.py - CodaLab Worksheets
Set a padding token or adjust the lengths of the sequences building the ... Returns: List[EncodingFast] or None if input was tokenized through...
Read more >
Informatica Truncation Issue - Data Management
I am facing a problem in that our source system can change the length of a field from say varchar2(5) to varchar2(10). When...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found