question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Still problem with fp16 and without autocast

See original GitHub issue

Describe the bug

Hello! There was often discussed about this issue and as far as I understand it is meanwhile fixed. But I stll get a

RuntimeError: expected scalar type Half but found Float

when I try to run the model with fp16 but without autocast.

My code:

pipe0 = StableDiffusionPipeline.from_pretrained(
	"CompVis/stable-diffusion-v1-4",
	use_auth_token="hf_LFWSneVmdLYPKbkIRpCrCKVxx",
	revision="fp16",
	torch_dtype=torch.float16
).to("cuda:0")

image = pipe0(a, num_inference_steps=int(d), width=int(e), height=int(f), guidance_scale=float(c))["sample"]

I am on GPU (3090) and have the latest build 0.4.1 I tried it with different schedulers but if I do not use autocast I get this error message (with default scheduler):

  File "generator1GPU.py", line 134, in heron1
    image = pipe0(a, num_inference_steps=int(d), width=int(e), height=int(f), guidance_scale=float(c))["sample"]
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 219, in __call__
    text_embeddings = self.text_encoder(text_input_ids.to(self.device))[0]
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 722, in forward
    return self.text_model(
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 643, in forward
    encoder_outputs = self.encoder(
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 574, in forward
    layer_outputs = encoder_layer(
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 317, in forward
    hidden_states, attn_weights = self.self_attn(
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 257, in forward
    attn_output = torch.bmm(attn_probs, value_states)
RuntimeError: expected scalar type Half but found Float

or when using another scheduler, the error trace is different but point to the same error source, here with DDIMScheduler:

no sessions
    text_embeddings = self.text_encoder(text_input_ids.to(self.device))[0]
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 722, in forward
    return self.text_model(
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 643, in forward
    encoder_outputs = self.encoder(
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 574, in forward
    layer_outputs = encoder_layer(
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 317, in forward
    hidden_states, attn_weights = self.self_attn(
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 257, in forward
    attn_output = torch.bmm(attn_probs, value_states)
RuntimeError: expected scalar type Half but found Float

Best regards Marc

Reproduction

No response

Logs

No response

System Info

  • diffusers version: 0.4.1
  • Platform: Linux-5.4.0-124-generic-x86_64-with-glibc2.10
  • Python version: 3.8.5
  • PyTorch version (GPU?): 1.11.0 (True)
  • Huggingface_hub version: 0.10.0
  • Transformers version: 4.19.2

EDIT: To my surprise this code works without throwing errors BUT the processing speed is the same than in the fp32 mode. Exactly the same. Only if adding autocast then it is faster. Obviously when formating the code in this way, the model does not understand that this is fp16. So no errors but also no speeding up.

```

pipe = StableDiffusionPipeline.from_pretrained(“CompVis/stable-diffusion-v1-4”, torch_type=torch.float16, revision=“fp16”,use_auth_token=“hf_LFWSneVmdLYPKbkIRpCrCKVnqgRxx”) pipe = pipe.to(“cuda:0”)

prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt)["sample"]
image[0].save("astronaut.jpg")

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:5 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
Marcophono2commented, Oct 7, 2022

What the … It is working now!! I changed thousand times my code, at the end I came back to the code I posted here and now it works! 14.46it/s - that is nice! 😃 But why does it suddenly work? Now the VRAM is also only filled with 4.5 GB. Strange. I will completely re-start my instance to see what happens then.

Best regards Marc

0reactions
Marcophono2commented, Oct 8, 2022

unet is a speed monster!

Read more comments on GitHub >

github_iconTop Results From Across the Web

AutoCast for mixed precision/fp16 fails? · Issue #35 - GitHub
I have tried to train the model using torch.cuda.amp.autocast() but the training doesn't seems to speeds up or memory usage remains same as ......
Read more >
How To Fit a Bigger Model and Train It Faster - Hugging Face
Here is the full description from this comment: Autocast maintains a cache of the FP16 casts of model parameters (leaves).
Read more >
AMP autocast not faster than FP32 - mixed-precision
The 15% speedup seems to be low, as the FP16 kernels are not fully saturating the GPU, which is visible in the Nsight...
Read more >
Chapter 8: Mixed Precision Training - DGL Docs
On a NVIDIA V100 (16GB) machine, training this model without fp16 consumes 15.2GB GPU memory; with fp16 turned on, the training consumes 12.8G...
Read more >
Mixed precision training - fastai
To understand the problems with half precision, let's look briefly at what an FP16 ... Mixed precision training using Pytorch's autocast and GradScaler ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found