Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Still problem with fp16 and without autocast

See original GitHub issue

Describe the bug

Hello! There was often discussed about this issue and as far as I understand it is meanwhile fixed. But I stll get a

RuntimeError: expected scalar type Half but found Float

when I try to run the model with fp16 but without autocast.

My code:

pipe0 = StableDiffusionPipeline.from_pretrained(
	"CompVis/stable-diffusion-v1-4",
	use_auth_token="hf_LFWSneVmdLYPKbkIRpCrCKVxx",
	revision="fp16",
	torch_dtype=torch.float16
).to("cuda:0")

image = pipe0(a, num_inference_steps=int(d), width=int(e), height=int(f), guidance_scale=float(c))["sample"]

I am on GPU (3090) and have the latest build 0.4.1 I tried it with different schedulers but if I do not use autocast I get this error message (with default scheduler):

  File "generator1GPU.py", line 134, in heron1
    image = pipe0(a, num_inference_steps=int(d), width=int(e), height=int(f), guidance_scale=float(c))["sample"]
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 219, in __call__
    text_embeddings = self.text_encoder(text_input_ids.to(self.device))[0]
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 722, in forward
    return self.text_model(
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 643, in forward
    encoder_outputs = self.encoder(
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 574, in forward
    layer_outputs = encoder_layer(
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 317, in forward
    hidden_states, attn_weights = self.self_attn(
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 257, in forward
    attn_output = torch.bmm(attn_probs, value_states)
RuntimeError: expected scalar type Half but found Float

or when using another scheduler, the error trace is different but point to the same error source, here with DDIMScheduler:

no sessions
    text_embeddings = self.text_encoder(text_input_ids.to(self.device))[0]
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 722, in forward
    return self.text_model(
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 643, in forward
    encoder_outputs = self.encoder(
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 574, in forward
    layer_outputs = encoder_layer(
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 317, in forward
    hidden_states, attn_weights = self.self_attn(
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/ldm/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 257, in forward
    attn_output = torch.bmm(attn_probs, value_states)
RuntimeError: expected scalar type Half but found Float

Best regards Marc

Reproduction

No response

Logs

No response

System Info

diffusers version: 0.4.1
Platform: Linux-5.4.0-124-generic-x86_64-with-glibc2.10
Python version: 3.8.5
PyTorch version (GPU?): 1.11.0 (True)
Huggingface_hub version: 0.10.0
Transformers version: 4.19.2

EDIT: To my surprise this code works without throwing errors BUT the processing speed is the same than in the fp32 mode. Exactly the same. Only if adding autocast then it is faster. Obviously when formating the code in this way, the model does not understand that this is fp16. So no errors but also no speeding up.

```

pipe = StableDiffusionPipeline.from_pretrained(“CompVis/stable-diffusion-v1-4”, torch_type=torch.float16, revision=“fp16”,use_auth_token=“hf_LFWSneVmdLYPKbkIRpCrCKVnqgRxx”) pipe = pipe.to(“cuda:0”)

prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt)["sample"]
image[0].save("astronaut.jpg")

Issue Analytics

State:
Created a year ago
Comments:5 (1 by maintainers)

Top GitHub Comments

1reaction

Marcophono2commented, Oct 7, 2022

What the … It is working now!! I changed thousand times my code, at the end I came back to the code I posted here and now it works! 14.46it/s - that is nice! 😃 But why does it suddenly work? Now the VRAM is also only filled with 4.5 GB. Strange. I will completely re-start my instance to see what happens then.

Best regards Marc

0reactions

Marcophono2commented, Oct 8, 2022

unet is a speed monster!

Top Results From Across the Web

AutoCast for mixed precision/fp16 fails? · Issue #35 - GitHub

I have tried to train the model using torch.cuda.amp.autocast() but the training doesn't seems to speeds up or memory usage remains same as ......

How To Fit a Bigger Model and Train It Faster - Hugging Face

Here is the full description from this comment: Autocast maintains a cache of the FP16 casts of model parameters (leaves).

AMP autocast not faster than FP32 - mixed-precision

The 15% speedup seems to be low, as the FP16 kernels are not fully saturating the GPU, which is visible in the Nsight...

Chapter 8: Mixed Precision Training - DGL Docs

On a NVIDIA V100 (16GB) machine, training this model without fp16 consumes 15.2GB GPU memory; with fp16 turned on, the training consumes 12.8G...

Mixed precision training - fastai

To understand the problems with half precision, let's look briefly at what an FP16 ... Mixed precision training using Pytorch's autocast and GradScaler ......