question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unexpectedly high fp16 memory usage

See original GitHub issue

I’ve been noticing that using fp16 has not resulted in much difference in model size or memory usage. Using the below script (taken from your docs directly) and only changing the flag fp16=True to False yields a difference of 4% VRAM usage and exactly the same checkpoint size for both.

This seems suspiciously small compared to other projects I’ve used with fp16 enabled. And a few people on the LAION discord Imagen channel are noticing the same thing. Although others seem to notice a bigger difference as well.

Wondering if it could be a difference of training scripts, since we all seem to be using our own custom variations.

import torch
from imagen_pytorch import Unet, Imagen, SRUnet256, ImagenTrainer

unet1 = Unet(
    dim = 32,
    dim_mults = (1, 2, 4),
    num_resnet_blocks = 3,
    layer_attns = (False, True, True),
    layer_cross_attns = False,
    use_linear_attn = True
)

unet2 = SRUnet256(
    dim = 32,
    dim_mults = (1, 2, 4),
    num_resnet_blocks = (2, 4, 8),
    layer_attns = (False, False, True),
    layer_cross_attns = False
)

imagen = Imagen(
    condition_on_text = False,
    unets = (unet1, unet2),
    image_sizes = (64, 128),
    timesteps = 1000
)

trainer = ImagenTrainer(
    imagen,
    fp16=False #change this and compare model sizes/memory usage
).cuda()

training_images = torch.randn(4, 3, 256, 256).cuda()


for i in range(100):
    loss = trainer(training_images, unet_number = 1)
    trainer.update(unet_number = 1)
    
trainer.save("./checkpoint.pt")

Issue Analytics

  • State:open
  • Created a year ago
  • Reactions:1
  • Comments:11 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
truftycommented, Aug 22, 2022

Yea, If its working for you, I has to be a local env issue… uggh. Thanks for helping so far. (and yea I had the fp16 flag set correctly)

1reaction
truftycommented, Aug 22, 2022

Just to confirm I’m not going crazy, I interrogated the fp16 = True model from the script above, and the dtype of all layers are float32 😢

unets.0.final_conv.weight    | torch.float32
unets.0.final_conv.bias    | torch.float32
unets.1.null_text_embed    | torch.float32
unets.1.null_text_hidden    | torch.float32
...
Read more comments on GitHub >

github_iconTop Results From Across the Web

Memory and speed - Hugging Face
We present some techniques and ideas to optimize Diffusers inference for memory or speed. As a general rule, we recommend the use of...
Read more >
Half The Precision, Twice The Fun: Working With FP16 In HLSL
What about use of fp16 in shared memory (LDS/TGSM) with DX? Any ideas?
Read more >
BFloat16: The secret to high performance on Cloud TPUs
Bfloat16 is carefully used within systolic arrays to accelerate matrix multiplication operations on Cloud TPUs.
Read more >
First Steps When Implementing FP16 - GPUOpen
Poor use of FP16 can result in excessive conversion between FP16 and FP32. This can reduce the performance advantage. FP16 gently increases code ......
Read more >
Running Stable Diffusion on Your GPU with Less Than 10Gb ...
ditch miniconda please) - Use lower FP precision mode if available to use the tensor cores (also to double "effective" memory) - Batch...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found