question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Dreambooth: RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasLtMatmul

See original GitHub issue

Describe the bug

Hi - I’ve spent a couple days trying to get Dreambooth to run, and can’t get past this:

_Steps: 0%| | 0/800 [00:00<?, ?it/s] Traceback (most recent call last): File “/scratch/StableDiffusion/diffusers/examples/dreambooth/train_dreambooth.py”, line 765, in <module> main() File “/scratch/StableDiffusion/diffusers/examples/dreambooth/train_dreambooth.py”, line 712, in main noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample File “/home/stablediffusion/.conda/envs/diffusers/lib/python3.9/site-packages/torch/nn/modules/module.py”, line 1190, in _call_impl return forward_call(*input, **kwargs) File “/home/stablediffusion/.conda/envs/diffusers/lib/python3.9/site-packages/deepspeed/utils/nvtx.py”, line 11, in wrapped_fn return func(*args, **kwargs) File “/home/stablediffusion/.conda/envs/diffusers/lib/python3.9/site-packages/deepspeed/runtime/engine.py”, line 1673, in forward loss = self.module(*inputs, **kwargs) File “/home/stablediffusion/.conda/envs/diffusers/lib/python3.9/site-packages/torch/nn/modules/module.py”, line 1190, in _call_impl return forward_call(*input, **kwargs) File “/home/stablediffusion/.conda/envs/diffusers/lib/python3.9/site-packages/diffusers/models/unet_2d_condition.py”, line 287, in forward emb = self.time_embedding(t_emb) File “/home/stablediffusion/.conda/envs/diffusers/lib/python3.9/site-packages/torch/nn/modules/module.py”, line 1190, in _call_impl return forward_call(*input, **kwargs) File “/home/stablediffusion/.conda/envs/diffusers/lib/python3.9/site-packages/diffusers/models/embeddings.py”, line 75, in forward sample = self.linear_1(sample) File “/home/stablediffusion/.conda/envs/diffusers/lib/python3.9/site-packages/torch/nn/modules/module.py”, line 1190, in call_impl return forward_call(*input, **kwargs) File “/home/stablediffusion/.conda/envs/diffusers/lib/python3.9/site-packages/torch/nn/modules/linear.py”, line 114, in forward return F.linear(input, self.weight, self.bias) RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasLtMatmul( ltHandle, computeDesc.descriptor(), &alpha_val, mat1_ptr, Adesc.descriptor(), mat2_ptr, Bdesc.descriptor(), &beta_val, result_ptr, Cdesc.descriptor(), result_ptr, Cdesc.descriptor(), &heuristicResult.algo, workspace.data_ptr(), workspaceSize, at::cuda::getCurrentCUDAStream()) Steps: 0%| | 0/800 [00:00<?, ?it/s] [2022-10-31 12:46:24,888] [INFO] [launch.py:286:sigkill_handler] Killing subprocess 711745 [2022-10-31 12:46:24,889] [ERROR] [launch.py:292:sigkill_handler] [‘/home/stablediffusion/.conda/envs/diffusers/bin/python’, ‘-u’, ‘train_dreambooth.py’, ‘–pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5’, ‘–instance_data_dir=training/dataset’, ‘–class_data_dir=classes’, ‘–output_dir=output’, ‘–instance_prompt=MyObject dragon’, ‘–class_prompt=dragon’, ‘–seed=3434554’, ‘–resolution=512’, ‘–center_crop’, ‘–train_batch_size=1’, ‘–mixed_precision=fp16’, ‘–use_8bit_adam’, ‘–gradient_accumulation_steps=1’, ‘–gradient_checkpointing’, ‘–learning_rate=5e-6’, ‘–lr_scheduler=constant’, ‘–lr_warmup_steps=0’, ‘–num_class_images=100’, ‘–sample_batch_size=4’, ‘–max_train_steps=800’] exits with return code = 1 Traceback (most recent call last): File “/home/stablediffusion/.conda/envs/diffusers/bin/accelerate”, line 8, in <module> sys.exit(main()) File “/home/stablediffusion/.conda/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py”, line 43, in main args.func(args) File “/home/stablediffusion/.conda/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/launch.py”, line 827, in launch_command deepspeed_launcher(args) File “/home/stablediffusion/.conda/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/launch.py”, line 540, in deepspeed_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command ‘[‘deepspeed’, ‘–no_local_rank’, ‘–num_gpus’, ‘1’, ‘train_dreambooth.py’, ‘–pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5’, ‘–instance_data_dir=training/dataset’, ‘–class_data_dir=classes’, ‘–output_dir=output’, ‘–instance_prompt=MyObject dragon’, ‘–class_prompt=dragon’, ‘–seed=3434554’, ‘–resolution=512’, ‘–center_crop’, ‘–train_batch_size=1’, ‘–mixed_precision=fp16’, ‘–use_8bit_adam’, ‘–gradient_accumulation_steps=1’, ‘–gradient_checkpointing’, ‘–learning_rate=5e-6’, ‘–lr_scheduler=constant’, ‘–lr_warmup_steps=0’, ‘–num_class_images=100’, ‘–sample_batch_size=4’, ‘–max_train_steps=800’]’ returned non-zero exit status 1.

I can run other CUDA apps just fine. No other GPU-using apps are running.

Reproduction

export MODEL_NAME=“runwayml/stable-diffusion-v1-5” export INSTANCE_DIR=“training/dataset” export CLASS_DIR=“classes” export OUTPUT_DIR=“output”

accelerate launch train_dreambooth.py
–pretrained_model_name_or_path=$MODEL_NAME
–instance_data_dir=$INSTANCE_DIR
–class_data_dir=$CLASS_DIR
–output_dir=$OUTPUT_DIR
–instance_prompt=“MyObject dragon”
–class_prompt=“dragon”
–seed=3434554
–resolution=512
–center_crop
–train_batch_size=1
–mixed_precision=“fp16”
–use_8bit_adam
–gradient_accumulation_steps=1 --gradient_checkpointing
–learning_rate=5e-6
–lr_scheduler=“constant”
–lr_warmup_steps=0
–num_class_images=100
–sample_batch_size=4
–max_train_steps=800

Logs

See above.

System Info

  • diffusers version: 0.7.0.dev0
  • Platform: Linux-5.19.16-200.fc36.x86_64-x86_64-with-glibc2.35
  • Python version: 3.9.13
  • PyTorch version (GPU?): 1.13.0+cu116 (True)
  • Huggingface_hub version: 0.10.1
  • Transformers version: 4.23.1
  • Using GPU in script?: <fill in>
  • Using distributed or parallel set-up in script?: <fill in>

GPU is a RTX 3060 (12GB), hence the need to limit memory usage.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
enn-nafnlauscommented, Nov 9, 2022

Are you using deepspeed for training ? If yes, then I would suggest to remove the --use_8bit_adam option as it doesn’t play well with deepspeed AFAIK.

Yes, I am - thanks for the tip; will try it out as soon as a (currently running) hypernetwork training run completes and frees up the card! 😃

0reactions
github-actions[bot]commented, Dec 4, 2022

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Read more comments on GitHub >

github_iconTop Results From Across the Web

CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when ...
environ['CUDA_LAUNCH_BLOCKING'] = "1" command after I got RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate(handle) ...
Read more >
CUBLAS_STATUS_EXECUTION...
Got the same tracktrace as above (RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasGemmEx( handle, opa, opb, m, ...
Read more >
cuda error: cublas_status_execution_failed when ... - You.com
This time restarting did not fix it. Another weird thing is when run my script in terminal it outputs RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED ......
Read more >
CUBLAS_STATUS_EXECUTION...
RuntimeError : CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, ...
Read more >
DreamBooth training in under 8 GB VRAM and textual ... - Reddit
pin_memory( RuntimeError: CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found