question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DreamBooth fails when training with 8bit adam optimizer.

See original GitHub issue

Describe the bug

Seems that 8bit adam optimizer fails when training dreambooth.

The command I used:

python train_dreambooth.py
–pretrained_model_name_or_path=“CompVis/stable-diffusion-v1-4”
–instance_data_dir=$instance_folder
–class_data_dir=$class_folder
–output_dir=$model_checkpoints_folder
–with_prior_preservation --prior_loss_weight=1.0
–instance_prompt=“a photo of sks dog”
–class_prompt=“a photo of dog”
–resolution=512
–train_batch_size=1
–gradient_accumulation_steps=2
–gradient_checkpointing
–use_8bit_adam
–learning_rate=5e-6
–lr_scheduler=“constant”
–lr_warmup_steps=0
–max_train_steps=800

Reproduction

No response

Logs


RuntimeError: set_sizes_and_strides is not allowed on a Tensor created from .data or .detach().
If your intent is to change the metadata of a Tensor (such as sizes / strides / storage / storage_offset)
without autograd tracking the change, remove the .data / .detach() call and wrap the change in a `with torch.no_grad():` block.
For example, change:
    x.data.set_(y)
to:
    with torch.no_grad():
        x.set_(y)

System Info

  • diffusers version: 0.4.0
  • Platform: Linux-5.8.0-44-generic-x86_64-with-glibc2.29
  • Python version: 3.8.10
  • PyTorch version (GPU?): torch 1.10
  • Huggingface_hub version: 0.10.0
  • Transformers version: 4.22.2
  • Using GPU in script?: <fill in>
  • Using distributed or parallel set-up in script?: <fill in>

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:7 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
hervenivoncommented, Oct 27, 2022

@paudom, I faced the same issue in a conda environment created with pytorch==1.10.2 and cudatoolkit=11.3.1. I solved my issue by running it inside a docker container using their latest aka nvcr.io/nvidia/pytorch:22.09-py3

1reaction
paudomcommented, Oct 14, 2022

@nanlliu Could you please tell me which Pytorch version or setup did you finally used? Thanks! I’m having the same issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Dreambooth broken, possibly because of ADAM optimizer ...
Ya imagine that's impossible if the entire latent space is getting regularized and forgetting everything it learned in its original training.
Read more >
[D] Dreambooth Stable Diffusion training in just 12.5 GB ...
[D] Dreambooth Stable Diffusion training in just 12.5 GB VRAM, using the 8bit adam optimizer from bitsandbytes along with xformers while being 2 ......
Read more >
Accelerate + Multi-GPU+ Automatic1111 + Dreambooth ...
I'm currently trying to use accelerate to run Dreambooth via ... 8 Bit Adam = Yes ... Applying xformers cross attention optimization.
Read more >
Stable Diffusion Tutorial Part 1: Run Dreambooth in Gradient ...
In this tutorial, we will walk step-by-step through the setup, training, and inference of a Dreambooth Stable Diffusion model within a ...
Read more >
How to Fine-tune Stable Diffusion using Dreambooth
This tutorial focuses on how to fine-tune Stable Diffusion using another method called Dreambooth. Unlike textual inversion method which train ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found