Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DreamBooth fails when training with 8bit adam optimizer.

See original GitHub issue

Describe the bug

Seems that 8bit adam optimizer fails when training dreambooth.

The command I used:

python train_dreambooth.py
–pretrained_model_name_or_path=“CompVis/stable-diffusion-v1-4”
–instance_data_dir=$instance_folder
–class_data_dir=$class_folder
–output_dir=$model_checkpoints_folder
–with_prior_preservation --prior_loss_weight=1.0
–instance_prompt=“a photo of sks dog”
–class_prompt=“a photo of dog”
–resolution=512
–train_batch_size=1
–gradient_accumulation_steps=2
–gradient_checkpointing
–use_8bit_adam
–learning_rate=5e-6
–lr_scheduler=“constant”
–lr_warmup_steps=0
–max_train_steps=800

Reproduction

No response

Logs


RuntimeError: set_sizes_and_strides is not allowed on a Tensor created from .data or .detach().
If your intent is to change the metadata of a Tensor (such as sizes / strides / storage / storage_offset)
without autograd tracking the change, remove the .data / .detach() call and wrap the change in a `with torch.no_grad():` block.
For example, change:
    x.data.set_(y)
to:
    with torch.no_grad():
        x.set_(y)

System Info

diffusers version: 0.4.0
Platform: Linux-5.8.0-44-generic-x86_64-with-glibc2.29
Python version: 3.8.10
PyTorch version (GPU?): torch 1.10
Huggingface_hub version: 0.10.0
Transformers version: 4.22.2
Using GPU in script?: <fill in>
Using distributed or parallel set-up in script?: <fill in>

Issue Analytics

State:
Created a year ago
Comments:7 (5 by maintainers)

Top GitHub Comments

1reaction

hervenivoncommented, Oct 27, 2022

@paudom, I faced the same issue in a conda environment created with pytorch==1.10.2 and cudatoolkit=11.3.1. I solved my issue by running it inside a docker container using their latest aka nvcr.io/nvidia/pytorch:22.09-py3

1reaction

paudomcommented, Oct 14, 2022

@nanlliu Could you please tell me which Pytorch version or setup did you finally used? Thanks! I’m having the same issue.

Top Results From Across the Web

Dreambooth broken, possibly because of ADAM optimizer ...

Ya imagine that's impossible if the entire latent space is getting regularized and forgetting everything it learned in its original training.

[D] Dreambooth Stable Diffusion training in just 12.5 GB ...

[D] Dreambooth Stable Diffusion training in just 12.5 GB VRAM, using the 8bit adam optimizer from bitsandbytes along with xformers while being 2 ......

Accelerate + Multi-GPU+ Automatic1111 + Dreambooth ...

I'm currently trying to use accelerate to run Dreambooth via ... 8 Bit Adam = Yes ... Applying xformers cross attention optimization.

Stable Diffusion Tutorial Part 1: Run Dreambooth in Gradient ...

In this tutorial, we will walk step-by-step through the setup, training, and inference of a Dreambooth Stable Diffusion model within a ...

How to Fine-tune Stable Diffusion using Dreambooth

This tutorial focuses on how to fine-tune Stable Diffusion using another method called Dreambooth. Unlike textual inversion method which train ...