Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BlenderBot-Distil-400M training fails if the input or target length exceeds a certain threshold, even when truncation and padding is on

See original GitHub issue

System Info

transformers version: 4.20.1, 4.21.0 Platform: Linux Python version: 3.7.6 Huggingface_hub version: 0.8.1 PyTorch version (GPU?): 1.10.2 (Yes) Tensorflow version (GPU?): not installed (NA) Flax version (CPU?/GPU?/TPU?): not installed (NA) Jax version: not installed JaxLib version: not installed Using GPU in script?: Yes (2+ Tesla V100) Using distributed or parallel set-up in script?: No

Who can help?

@patil-suraj

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, …)
My own task or dataset (give details below)

Reproduction

Run the following script with python script_blenderbot_length.py

# The contents of script_blenderbot_length.py
# To make the code crash, set CRITICAL_NUMBER=64
# To make it pass, set CRITICAL_NUMBER=63
# The code fails if EITHER the input or the target is repeated 64+ times.

from __future__ import annotations
import functools
import typing as tp
import datasets
import transformers
from transformers import (
    DataCollatorForSeq2Seq,
    PreTrainedTokenizer,
    Seq2SeqTrainingArguments,
    Seq2SeqTrainer,
)


CRITICAL_NUMBER = 64


increment_en = [
    {"input": "One", "target": "Two"},
    {"input": "Three "*2, "target": "Four "*2},
    {"input": "Five "*4, "target": "Six "*4},
    {"input": "Seven "*8, "target": "Eight "*8},
    {"input": "Nine "*CRITICAL_NUMBER, "target": "Ten "*CRITICAL_NUMBER},
]
increment_en = increment_en * 100


def lod_to_dol(list_of_dicts: tp.List[tp.Dict[str, tp.Any]]) -> tp.Dict[str, list]:
    dict_of_lists = {
        key: [dct[key] for dct in list_of_dicts] for key in list_of_dicts[0]
    }
    return dict_of_lists


increment_en = lod_to_dol(increment_en)


def preprocess_function_(
    examples,
    tokenizer: PreTrainedTokenizer,
    max_input_length: int,
    max_target_length: int,
):
    inputs = examples["input"]
    targets = examples["target"]

    model_inputs = tokenizer(inputs, max_length=max_input_length, truncation=True)

    # Setup the tokenizer for targets
    with tokenizer.as_target_tokenizer():
        labels = tokenizer(targets, max_length=max_target_length, truncation=True)

    model_inputs["labels"] = labels["input_ids"]
    return model_inputs


def main():
    tokenizer = transformers.BlenderbotTokenizer.from_pretrained("facebook/blenderbot-400M-distill")
    model = transformers.BlenderbotForConditionalGeneration.from_pretrained("facebook/blenderbot-400M-distill")

    args = Seq2SeqTrainingArguments(
        "script_debug",
        per_device_train_batch_size=4,
        per_device_eval_batch_size=4,
        fp16=True,
        push_to_hub=False,
        max_steps=10000,
        logging_steps=5000,
        save_steps=5000
    )

    data_collator = DataCollatorForSeq2Seq(tokenizer, model=model, padding=True)

    dataset = datasets.DatasetDict(
        {
            "train": datasets.Dataset.from_dict(increment_en),
            "test": datasets.Dataset.from_dict(increment_en),
        }
    )

    preprocess_function = functools.partial(
        preprocess_function_,
        tokenizer=tokenizer,
        max_input_length=512,
        max_target_length=512
    )

    processed_ds = dataset.map(preprocess_function, batched=True)
    processed_ds.set_format(
        type="torch", columns=["input_ids", "attention_mask", "labels"]
    )

    trainer = Seq2SeqTrainer(
        model,
        args,
        train_dataset=processed_ds["train"],
        eval_dataset=processed_ds["test"],
        data_collator=data_collator,
        tokenizer=tokenizer,
    )
    trainer.train()


if __name__ == "__main__":
    main()

Running the code when CRITICAL_NUMBER is set to 64 or greater leads to the bizarre series of CUDA asserts:

<Similar messages appear above, which are omitted for brevity>
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [29,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [30,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [31,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [32,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [33,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [34,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [35,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [36,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [37,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [38,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [39,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [40,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [41,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [42,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [43,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [44,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [45,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [46,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [47,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [48,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [49,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [50,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [51,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [52,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [53,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [54,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [55,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [56,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [57,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [58,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [59,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [60,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [61,0,0] Assertion `srcIndex < srcSelectDimSi
ze` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [62,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [63,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
  0%|                              | 0/10000 [00:07<?, ?it/s]
root@bolt-imq45r3c3y-8dfzr73qqa:/mnt/task_runtime# python script_blenderbot_length.py 
100%|██████████████████████████| 1/1 [00:00<00:00,  5.30ba/s]
100%|██████████████████████████| 1/1 [00:00<00:00,  5.72ba/s]
max_steps is given, it will override any value given in num_train_epochs
Using cuda_amp half precision backend
The following columns in the training set don't have a corresponding argument in `BlenderbotForConditionalGeneration.forward` and have been ignored: target, input. If target, input are not expected by `BlenderbotForConditionalGeneration.forward`,  you can safely ignore this message.
/miniconda/lib/python3.7/site-packages/transformers/optimization.py:310: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  FutureWarning,
***** Running training *****
  Num examples = 500
  Num Epochs = 313
  Instantaneous batch size per device = 4
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 1
  Total optimization steps = 10000
  0%|                              | 0/10000 [00:00<?, ?it/s]Traceback (most recent call last):
  File "script_blenderbot_length.py", line 101, in <module>
    main()
  File "script_blenderbot_length.py", line 97, in main
    trainer.train()
  File "/miniconda/lib/python3.7/site-packages/transformers/trainer.py", line 1502, in train
    ignore_keys_for_eval=ignore_keys_for_eval,
  File "/miniconda/lib/python3.7/site-packages/transformers/trainer.py", line 1740, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/miniconda/lib/python3.7/site-packages/transformers/trainer.py", line 2470, in training_step
    loss = self.compute_loss(model, inputs)
  File "/miniconda/lib/python3.7/site-packages/transformers/trainer.py", line 2502, in compute_loss
    outputs = model(**inputs)
  File "/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/miniconda/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 168, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/miniconda/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/miniconda/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
    output.reraise()
  File "/miniconda/lib/python3.7/site-packages/torch/_utils.py", line 434, in reraise
    raise exception
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/miniconda/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
    output = module(*input, **kwargs)
  File "/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/miniconda/lib/python3.7/site-packages/transformers/models/blenderbot/modeling_blenderbot.py", line 1340, in forward
    return_dict=return_dict,
  File "/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/miniconda/lib/python3.7/site-packages/transformers/models/blenderbot/modeling_blenderbot.py", line 1181, in forward
    return_dict=return_dict,
  File "/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/miniconda/lib/python3.7/site-packages/transformers/models/blenderbot/modeling_blenderbot.py", line 785, in forward
    output_attentions=output_attentions,
  File "/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/miniconda/lib/python3.7/site-packages/transformers/models/blenderbot/modeling_blenderbot.py", line 318, in forward
    output_attentions=output_attentions,
  File "/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/miniconda/lib/python3.7/site-packages/transformers/models/blenderbot/modeling_blenderbot.py", line 180, in forward
    query_states = self.q_proj(hidden_states) * self.scaling
  File "/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/miniconda/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 103, in forward
    return F.linear(input, self.weight, self.bias)
  File "/miniconda/lib/python3.7/site-packages/torch/nn/functional.py", line 1848, in linear
    return torch._C._nn.linear(input, weight, bias)
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`

Expected behavior

The training code should not crash, especially when there are far fewer tokens than the tokenization limit.

Issue Analytics

State:
Created a year ago
Comments:11 (1 by maintainers)

Top GitHub Comments

1reaction

shermansiucommented, Aug 3, 2022

Ah. So the issue is that in the BlenderbotConfig, max_position_embeddings is set to 128. The publicly available weights only have position embeddings with those dimensions, so either I’d have to train from scratch or reduce the max tokenizer length to 128.

0reactions

github-actions[bot]commented, Sep 2, 2022

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Top Results From Across the Web

Padding and truncation - Hugging Face

Truncation works in the other direction by truncating long sequences. In most cases, padding your batch to the length of the longest sequence...

How to make a Trainer pad inputs in a batch with huggingface ...

I've tried putting the padding and truncation parameters in the tokenizer, in the Trainer, and in the training_args. Nothing does. Any idea?

Divide Hugging Face Transformers training time by 2 or more ...

Dynamic padding: we limit the number of added pad tokens to reach the length of the longest sequence of each mini batch instead...

tokenization_utils.py - CodaLab Worksheets

Set a padding token or adjust the lengths of the sequences building the ... Returns: List[EncodingFast] or None if input was tokenized through...

Informatica Truncation Issue - Data Management

I am facing a problem in that our source system can change the length of a field from say varchar2(5) to varchar2(10). When...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

BlenderBot-Distil-400M training fails if the input or target length exceeds a certain threshold, even when truncation and padding is on

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Update no_trainer scripts to include gradient accumulation

BartLearnedPositionalEmbedding's forward method signature obstructs private (Opacus) training of BART