question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Feature Request] Dreambooth - Save intermediate checkpoints

See original GitHub issue

Is your feature request related to a problem? Please describe. Dreambooth can drastically change its output quality between step counts, including to the worse if the chosen learning rate is too high for the step count or amount of training / regularization images. This implementation only saves the model after training is finished, which requires full reruns to compare different step counts and also makes it impossible to salvage an overfitted model.

Describe the solution you’d like A configurable way to save the model at certain step counts and continue training afterwards. Optimally, the script would accept two new parameters, one to specify the step interval to save at and one to specify how many to keep before overwriting. In some of the popular non-diffuser-implementations like https://github.com/XavierXiao/Dreambooth-Stable-Diffusion and resulting forks, these arguments are called every_n_train_steps and save_top_k. However, since this implementation doesn’t generate intermediate checkpoints by default, it would probably be better to find a more descriptive name.

Describe alternatives you’ve considered Technically, it would also be possible to just manually resume training from a previous checkpoint and use low step counts for each run, however this requires additional effort and also is hard to do in some Colabs based on this implementation, so an integrated solution would be preferred.

Additional context I tried a naive implementation by simply calling pipeline.save_pretrained after every X steps, however this would lead to an error after successfully saving a few files:

File "/diffusers/pipeline_utils.py", line 158, in save_pretrained save_method = getattr(sub_model, save_method_name)
TypeError: getattr(): attribute name must be string

I called the method in the same way as the final save, including a call to accelerator.wait_for_everyone() beforehand, as suggested in the Accelerate documentation. Since I am not familiar with the Accelerate and Stable Diffusion architectures, I couldn’t find out why so far, but from the error message it seems that StableDiffusionPipeline could not find a valid save method name due to missing some information about the model at this point.

Issue Analytics

  • State:closed
  • Created a year ago
  • Reactions:2
  • Comments:14 (11 by maintainers)

github_iconTop GitHub Comments

3reactions
patil-surajcommented, Oct 20, 2022

@DominikDoom Thanks a lot for the issue, working on adding intermidiate checkpoint saving.

@Cyberes It seems that the safety checker is not saved in the model that you are passing, that’s what the error indicated, make sure they safety checker is also saved there. Feel free to open an issue if the error persists even after that.

0reactions
pcuencacommented, Dec 13, 2022

Fixed by #1668 (except keeping the last n checkpoints, to be adapted from https://github.com/huggingface/accelerate/issues/914).

Read more comments on GitHub >

github_iconTop Results From Across the Web

Stable Diffusion Tutorial Part 1: Run Dreambooth in Gradient ...
In this article, we walked through each of the steps for creating a Dreambooth concept from scratch within a Gradient Notebook, ...
Read more >
New (simple) Dreambooth method is out, train under 10 ...
New (simple) Dreambooth method is out, train under 10 minutes without class images on multiple subjects, retrainable-ish model.
Read more >
Models - Hugging Face
... feature can be referred to as “activation checkpointing” or “checkpoint activations”. ... A path to a directory containing model weights saved using ......
Read more >
Personalizing Text-to-Image Generation using Textual Inversion
The primary concern shared by reviewers is a request for additional ... To clarify: we are not using intermediate features or mixing inputs ......
Read more >
dreambooth-for-diffusion | Kaggle
If you have downloaded this checkpoint from the Hugging Face Hub, it would be very nice if you could open a Pull request...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found