Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Feature Request] Dreambooth - Save intermediate checkpoints

See original GitHub issue

Is your feature request related to a problem? Please describe. Dreambooth can drastically change its output quality between step counts, including to the worse if the chosen learning rate is too high for the step count or amount of training / regularization images. This implementation only saves the model after training is finished, which requires full reruns to compare different step counts and also makes it impossible to salvage an overfitted model.

Describe the solution you’d like A configurable way to save the model at certain step counts and continue training afterwards. Optimally, the script would accept two new parameters, one to specify the step interval to save at and one to specify how many to keep before overwriting. In some of the popular non-diffuser-implementations like https://github.com/XavierXiao/Dreambooth-Stable-Diffusion and resulting forks, these arguments are called every_n_train_steps and save_top_k. However, since this implementation doesn’t generate intermediate checkpoints by default, it would probably be better to find a more descriptive name.

Describe alternatives you’ve considered Technically, it would also be possible to just manually resume training from a previous checkpoint and use low step counts for each run, however this requires additional effort and also is hard to do in some Colabs based on this implementation, so an integrated solution would be preferred.

Additional context I tried a naive implementation by simply calling pipeline.save_pretrained after every X steps, however this would lead to an error after successfully saving a few files:

File "/diffusers/pipeline_utils.py", line 158, in save_pretrained save_method = getattr(sub_model, save_method_name)
TypeError: getattr(): attribute name must be string

I called the method in the same way as the final save, including a call to accelerator.wait_for_everyone() beforehand, as suggested in the Accelerate documentation. Since I am not familiar with the Accelerate and Stable Diffusion architectures, I couldn’t find out why so far, but from the error message it seems that StableDiffusionPipeline could not find a valid save method name due to missing some information about the model at this point.

Issue Analytics

State:
Created a year ago
Reactions:2
Comments:14 (11 by maintainers)

Top GitHub Comments

3reactions

patil-surajcommented, Oct 20, 2022

@DominikDoom Thanks a lot for the issue, working on adding intermidiate checkpoint saving.

@Cyberes It seems that the safety checker is not saved in the model that you are passing, that’s what the error indicated, make sure they safety checker is also saved there. Feel free to open an issue if the error persists even after that.

0reactions

pcuencacommented, Dec 13, 2022

Fixed by #1668 (except keeping the last n checkpoints, to be adapted from https://github.com/huggingface/accelerate/issues/914).

Top Results From Across the Web

Stable Diffusion Tutorial Part 1: Run Dreambooth in Gradient ...

In this article, we walked through each of the steps for creating a Dreambooth concept from scratch within a Gradient Notebook, ...

New (simple) Dreambooth method is out, train under 10 ...

New (simple) Dreambooth method is out, train under 10 minutes without class images on multiple subjects, retrainable-ish model.

Models - Hugging Face

... feature can be referred to as “activation checkpointing” or “checkpoint activations”. ... A path to a directory containing model weights saved using ......

Personalizing Text-to-Image Generation using Textual Inversion

The primary concern shared by reviewers is a request for additional ... To clarify: we are not using intermediate features or mixing inputs ......

dreambooth-for-diffusion | Kaggle

If you have downloaded this checkpoint from the Hugging Face Hub, it would be very nice if you could open a Pull request...