BlenderBot-Distil-400M training fails if the input or target length exceeds a certain threshold, even when truncation and padding is on
See original GitHub issueSystem Info
transformers version: 4.20.1, 4.21.0 Platform: Linux Python version: 3.7.6 Huggingface_hub version: 0.8.1 PyTorch version (GPU?): 1.10.2 (Yes) Tensorflow version (GPU?): not installed (NA) Flax version (CPU?/GPU?/TPU?): not installed (NA) Jax version: not installed JaxLib version: not installed Using GPU in script?: Yes (2+ Tesla V100) Using distributed or parallel set-up in script?: No
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, β¦) - My own task or dataset (give details below)
Reproduction
Run the following script with python script_blenderbot_length.py
# The contents of script_blenderbot_length.py
# To make the code crash, set CRITICAL_NUMBER=64
# To make it pass, set CRITICAL_NUMBER=63
# The code fails if EITHER the input or the target is repeated 64+ times.
from __future__ import annotations
import functools
import typing as tp
import datasets
import transformers
from transformers import (
DataCollatorForSeq2Seq,
PreTrainedTokenizer,
Seq2SeqTrainingArguments,
Seq2SeqTrainer,
)
CRITICAL_NUMBER = 64
increment_en = [
{"input": "One", "target": "Two"},
{"input": "Three "*2, "target": "Four "*2},
{"input": "Five "*4, "target": "Six "*4},
{"input": "Seven "*8, "target": "Eight "*8},
{"input": "Nine "*CRITICAL_NUMBER, "target": "Ten "*CRITICAL_NUMBER},
]
increment_en = increment_en * 100
def lod_to_dol(list_of_dicts: tp.List[tp.Dict[str, tp.Any]]) -> tp.Dict[str, list]:
dict_of_lists = {
key: [dct[key] for dct in list_of_dicts] for key in list_of_dicts[0]
}
return dict_of_lists
increment_en = lod_to_dol(increment_en)
def preprocess_function_(
examples,
tokenizer: PreTrainedTokenizer,
max_input_length: int,
max_target_length: int,
):
inputs = examples["input"]
targets = examples["target"]
model_inputs = tokenizer(inputs, max_length=max_input_length, truncation=True)
# Setup the tokenizer for targets
with tokenizer.as_target_tokenizer():
labels = tokenizer(targets, max_length=max_target_length, truncation=True)
model_inputs["labels"] = labels["input_ids"]
return model_inputs
def main():
tokenizer = transformers.BlenderbotTokenizer.from_pretrained("facebook/blenderbot-400M-distill")
model = transformers.BlenderbotForConditionalGeneration.from_pretrained("facebook/blenderbot-400M-distill")
args = Seq2SeqTrainingArguments(
"script_debug",
per_device_train_batch_size=4,
per_device_eval_batch_size=4,
fp16=True,
push_to_hub=False,
max_steps=10000,
logging_steps=5000,
save_steps=5000
)
data_collator = DataCollatorForSeq2Seq(tokenizer, model=model, padding=True)
dataset = datasets.DatasetDict(
{
"train": datasets.Dataset.from_dict(increment_en),
"test": datasets.Dataset.from_dict(increment_en),
}
)
preprocess_function = functools.partial(
preprocess_function_,
tokenizer=tokenizer,
max_input_length=512,
max_target_length=512
)
processed_ds = dataset.map(preprocess_function, batched=True)
processed_ds.set_format(
type="torch", columns=["input_ids", "attention_mask", "labels"]
)
trainer = Seq2SeqTrainer(
model,
args,
train_dataset=processed_ds["train"],
eval_dataset=processed_ds["test"],
data_collator=data_collator,
tokenizer=tokenizer,
)
trainer.train()
if __name__ == "__main__":
main()
Running the code when CRITICAL_NUMBER
is set to 64 or greater leads to the bizarre series of CUDA asserts:
<Similar messages appear above, which are omitted for brevity>
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [29,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [30,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [31,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [32,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [33,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [34,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [35,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [36,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [37,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [38,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [39,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [40,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [41,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [42,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [43,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [44,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [45,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [46,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [47,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [48,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [49,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [50,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [51,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [52,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [53,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [54,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [55,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [56,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [57,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [58,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [59,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [60,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [61,0,0] Assertion `srcIndex < srcSelectDimSi
ze` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [62,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1640811797118/work/aten/src/ATen/native/cuda/Indexing.cu:699: indexSelectLargeIndex: block: [2,0,0], thread: [63,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
0%| | 0/10000 [00:07<?, ?it/s]
root@bolt-imq45r3c3y-8dfzr73qqa:/mnt/task_runtime# python script_blenderbot_length.py
100%|ββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 5.30ba/s]
100%|ββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 5.72ba/s]
max_steps is given, it will override any value given in num_train_epochs
Using cuda_amp half precision backend
The following columns in the training set don't have a corresponding argument in `BlenderbotForConditionalGeneration.forward` and have been ignored: target, input. If target, input are not expected by `BlenderbotForConditionalGeneration.forward`, you can safely ignore this message.
/miniconda/lib/python3.7/site-packages/transformers/optimization.py:310: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
FutureWarning,
***** Running training *****
Num examples = 500
Num Epochs = 313
Instantaneous batch size per device = 4
Total train batch size (w. parallel, distributed & accumulation) = 16
Gradient Accumulation steps = 1
Total optimization steps = 10000
0%| | 0/10000 [00:00<?, ?it/s]Traceback (most recent call last):
File "script_blenderbot_length.py", line 101, in <module>
main()
File "script_blenderbot_length.py", line 97, in main
trainer.train()
File "/miniconda/lib/python3.7/site-packages/transformers/trainer.py", line 1502, in train
ignore_keys_for_eval=ignore_keys_for_eval,
File "/miniconda/lib/python3.7/site-packages/transformers/trainer.py", line 1740, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/miniconda/lib/python3.7/site-packages/transformers/trainer.py", line 2470, in training_step
loss = self.compute_loss(model, inputs)
File "/miniconda/lib/python3.7/site-packages/transformers/trainer.py", line 2502, in compute_loss
outputs = model(**inputs)
File "/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/miniconda/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 168, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/miniconda/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/miniconda/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/miniconda/lib/python3.7/site-packages/torch/_utils.py", line 434, in reraise
raise exception
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/miniconda/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/miniconda/lib/python3.7/site-packages/transformers/models/blenderbot/modeling_blenderbot.py", line 1340, in forward
return_dict=return_dict,
File "/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/miniconda/lib/python3.7/site-packages/transformers/models/blenderbot/modeling_blenderbot.py", line 1181, in forward
return_dict=return_dict,
File "/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/miniconda/lib/python3.7/site-packages/transformers/models/blenderbot/modeling_blenderbot.py", line 785, in forward
output_attentions=output_attentions,
File "/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/miniconda/lib/python3.7/site-packages/transformers/models/blenderbot/modeling_blenderbot.py", line 318, in forward
output_attentions=output_attentions,
File "/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/miniconda/lib/python3.7/site-packages/transformers/models/blenderbot/modeling_blenderbot.py", line 180, in forward
query_states = self.q_proj(hidden_states) * self.scaling
File "/miniconda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/miniconda/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 103, in forward
return F.linear(input, self.weight, self.bias)
File "/miniconda/lib/python3.7/site-packages/torch/nn/functional.py", line 1848, in linear
return torch._C._nn.linear(input, weight, bias)
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`
Expected behavior
The training code should not crash, especially when there are far fewer tokens than the tokenization limit.
Issue Analytics
- State:
- Created a year ago
- Comments:11 (1 by maintainers)
Top Results From Across the Web
Padding and truncation - Hugging Face
Truncation works in the other direction by truncating long sequences. In most cases, padding your batch to the length of the longest sequence...
Read more >How to make a Trainer pad inputs in a batch with huggingface ...
I've tried putting the padding and truncation parameters in the tokenizer, in the Trainer, and in the training_args. Nothing does. Any idea?
Read more >Divide Hugging Face Transformers training time by 2 or more ...
Dynamic padding: we limit the number of added pad tokens to reach the length of the longest sequence of each mini batch instead...
Read more >tokenization_utils.py - CodaLab Worksheets
Set a padding token or adjust the lengths of the sequences building the ... Returns: List[EncodingFast] or None if input was tokenized through...
Read more >Informatica Truncation Issue - Data Management
I am facing a problem in that our source system can change the length of a field from say varchar2(5) to varchar2(10). When...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Ah. So the issue is that in the
BlenderbotConfig
,max_position_embeddings
is set to 128. The publicly available weights only have position embeddings with those dimensions, so either Iβd have to train from scratch or reduce the max tokenizer length to 128.This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.