Dreambooth: RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasLtMatmul
See original GitHub issueDescribe the bug
Hi - I’ve spent a couple days trying to get Dreambooth to run, and can’t get past this:
_Steps: 0%| | 0/800 [00:00<?, ?it/s]
Traceback (most recent call last):
File “/scratch/StableDiffusion/diffusers/examples/dreambooth/train_dreambooth.py”, line 765, in <module>
main()
File “/scratch/StableDiffusion/diffusers/examples/dreambooth/train_dreambooth.py”, line 712, in main
noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
File “/home/stablediffusion/.conda/envs/diffusers/lib/python3.9/site-packages/torch/nn/modules/module.py”, line 1190, in _call_impl
return forward_call(*input, **kwargs)
File “/home/stablediffusion/.conda/envs/diffusers/lib/python3.9/site-packages/deepspeed/utils/nvtx.py”, line 11, in wrapped_fn
return func(*args, **kwargs)
File “/home/stablediffusion/.conda/envs/diffusers/lib/python3.9/site-packages/deepspeed/runtime/engine.py”, line 1673, in forward
loss = self.module(*inputs, **kwargs)
File “/home/stablediffusion/.conda/envs/diffusers/lib/python3.9/site-packages/torch/nn/modules/module.py”, line 1190, in _call_impl
return forward_call(*input, **kwargs)
File “/home/stablediffusion/.conda/envs/diffusers/lib/python3.9/site-packages/diffusers/models/unet_2d_condition.py”, line 287, in forward
emb = self.time_embedding(t_emb)
File “/home/stablediffusion/.conda/envs/diffusers/lib/python3.9/site-packages/torch/nn/modules/module.py”, line 1190, in _call_impl
return forward_call(*input, **kwargs)
File “/home/stablediffusion/.conda/envs/diffusers/lib/python3.9/site-packages/diffusers/models/embeddings.py”, line 75, in forward
sample = self.linear_1(sample)
File “/home/stablediffusion/.conda/envs/diffusers/lib/python3.9/site-packages/torch/nn/modules/module.py”, line 1190, in call_impl
return forward_call(*input, **kwargs)
File “/home/stablediffusion/.conda/envs/diffusers/lib/python3.9/site-packages/torch/nn/modules/linear.py”, line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasLtMatmul( ltHandle, computeDesc.descriptor(), &alpha_val, mat1_ptr, Adesc.descriptor(), mat2_ptr, Bdesc.descriptor(), &beta_val, result_ptr, Cdesc.descriptor(), result_ptr, Cdesc.descriptor(), &heuristicResult.algo, workspace.data_ptr(), workspaceSize, at::cuda::getCurrentCUDAStream())
Steps: 0%| | 0/800 [00:00<?, ?it/s]
[2022-10-31 12:46:24,888] [INFO] [launch.py:286:sigkill_handler] Killing subprocess 711745
[2022-10-31 12:46:24,889] [ERROR] [launch.py:292:sigkill_handler] [‘/home/stablediffusion/.conda/envs/diffusers/bin/python’, ‘-u’, ‘train_dreambooth.py’, ‘–pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5’, ‘–instance_data_dir=training/dataset’, ‘–class_data_dir=classes’, ‘–output_dir=output’, ‘–instance_prompt=MyObject dragon’, ‘–class_prompt=dragon’, ‘–seed=3434554’, ‘–resolution=512’, ‘–center_crop’, ‘–train_batch_size=1’, ‘–mixed_precision=fp16’, ‘–use_8bit_adam’, ‘–gradient_accumulation_steps=1’, ‘–gradient_checkpointing’, ‘–learning_rate=5e-6’, ‘–lr_scheduler=constant’, ‘–lr_warmup_steps=0’, ‘–num_class_images=100’, ‘–sample_batch_size=4’, ‘–max_train_steps=800’] exits with return code = 1
Traceback (most recent call last):
File “/home/stablediffusion/.conda/envs/diffusers/bin/accelerate”, line 8, in <module>
sys.exit(main())
File “/home/stablediffusion/.conda/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py”, line 43, in main
args.func(args)
File “/home/stablediffusion/.conda/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/launch.py”, line 827, in launch_command
deepspeed_launcher(args)
File “/home/stablediffusion/.conda/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/launch.py”, line 540, in deepspeed_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command ‘[‘deepspeed’, ‘–no_local_rank’, ‘–num_gpus’, ‘1’, ‘train_dreambooth.py’, ‘–pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5’, ‘–instance_data_dir=training/dataset’, ‘–class_data_dir=classes’, ‘–output_dir=output’, ‘–instance_prompt=MyObject dragon’, ‘–class_prompt=dragon’, ‘–seed=3434554’, ‘–resolution=512’, ‘–center_crop’, ‘–train_batch_size=1’, ‘–mixed_precision=fp16’, ‘–use_8bit_adam’, ‘–gradient_accumulation_steps=1’, ‘–gradient_checkpointing’, ‘–learning_rate=5e-6’, ‘–lr_scheduler=constant’, ‘–lr_warmup_steps=0’, ‘–num_class_images=100’, ‘–sample_batch_size=4’, ‘–max_train_steps=800’]’ returned non-zero exit status 1.
I can run other CUDA apps just fine. No other GPU-using apps are running.
Reproduction
export MODEL_NAME=“runwayml/stable-diffusion-v1-5” export INSTANCE_DIR=“training/dataset” export CLASS_DIR=“classes” export OUTPUT_DIR=“output”
accelerate launch train_dreambooth.py
–pretrained_model_name_or_path=$MODEL_NAME
–instance_data_dir=$INSTANCE_DIR
–class_data_dir=$CLASS_DIR
–output_dir=$OUTPUT_DIR
–instance_prompt=“MyObject dragon”
–class_prompt=“dragon”
–seed=3434554
–resolution=512
–center_crop
–train_batch_size=1
–mixed_precision=“fp16”
–use_8bit_adam
–gradient_accumulation_steps=1 --gradient_checkpointing
–learning_rate=5e-6
–lr_scheduler=“constant”
–lr_warmup_steps=0
–num_class_images=100
–sample_batch_size=4
–max_train_steps=800
Logs
See above.
System Info
diffusers
version: 0.7.0.dev0- Platform: Linux-5.19.16-200.fc36.x86_64-x86_64-with-glibc2.35
- Python version: 3.9.13
- PyTorch version (GPU?): 1.13.0+cu116 (True)
- Huggingface_hub version: 0.10.1
- Transformers version: 4.23.1
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>
GPU is a RTX 3060 (12GB), hence the need to limit memory usage.
Issue Analytics
- State:
- Created a year ago
- Comments:7 (3 by maintainers)
Top GitHub Comments
Yes, I am - thanks for the tip; will try it out as soon as a (currently running) hypernetwork training run completes and frees up the card! 😃
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.