Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Dreambooth Example] Attempting to unscale FP16 gradients.

See original GitHub issue

Describe the bug

I had the training script working fine but then I updated diffusers to 0.7.2 and now I get the following error:

Traceback (most recent call last):
  File "/tmp/pycharm_project_990/train_dreambooth.py", line 938, in <module>
    main(args)
  File "/tmp/pycharm_project_990/train_dreambooth.py", line 876, in main
    optimizer.step()
  File "/opt/conda/envs/dreambooth/lib/python3.7/site-packages/accelerate/optimizer.py", line 134, in step
    self.scaler.step(self.optimizer, closure)
  File "/opt/conda/envs/dreambooth/lib/python3.7/site-packages/torch/cuda/amp/grad_scaler.py", line 337, in step
    self.unscale_(optimizer)
  File "/opt/conda/envs/dreambooth/lib/python3.7/site-packages/torch/cuda/amp/grad_scaler.py", line 282, in unscale_
    optimizer_state["found_inf_per_device"] = self._unscale_grads_(optimizer, inv_scale, found_inf, False)
  File "/opt/conda/envs/dreambooth/lib/python3.7/site-packages/torch/cuda/amp/grad_scaler.py", line 210, in _unscale_grads_
    raise ValueError("Attempting to unscale FP16 gradients.")
ValueError: Attempting to unscale FP16 gradients.
Steps:   0%|          | 0/800 [00:18<?, ?it/s]

Any ideas, or do I need to downgrade?

Reproduction

No response

Logs

No response

System Info

diffusers 0.7.2 python 3.7.12 accelerate 0.14.0

Issue Analytics

State:
Created 10 months ago
Comments:26 (11 by maintainers)

Top GitHub Comments

2reactions

patil-surajcommented, Nov 21, 2022

Thanks for the detailed issue, taking a look now.

1reaction

gadicccommented, Dec 1, 2022

Hi all, sorry for the radio silence… some time sensitive matters snuck up on me. I hope one of the other contributors to this issue can confirm the fix, otherwise I hope to have a chance to try this out on Sunday and promise to report back after.

Thank you both @patil-suraj and @patrickvonplaten for your amazing and quick work here! (And patil-suraj, thanks, I indeed got dreambooth working with fp32 too, it kind of fixed itself but I think I had been loading one of the components with an incompatible model).

🙏