question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Move the module to the precision dtype

See original GitHub issue

🐛 Bug

@SeanNaren When I use bf-16 and check the dtype of the model, it seems like the model’s precision is fp32 (and I do not see the memory gains I expect). On other frameworks that support bf-16 (like fairseq) the model’s dtype is torch.bfloat16. Is there a simple example that “proves” that this feature reduces the memory consumption as it should? I suspect that there might be something wrong (but of course, I might be wrong). Thank you!

To Reproduce

launch any job with precision=bf16 and compare with precision=32.

Expected behavior

This feature should save 30-50% memory but I do not see such gains in lightning.

Environment

  • CUDA:
    • GPU:
      • GeForce RTX 3090
    • available: True
    • version: 11.3
  • Packages:
    • numpy: 1.21.2
    • pyTorch_debug: False
    • pyTorch_version: 1.11.0
    • pytorch-lightning: 1.6.0dev
    • tqdm: 4.62.3
  • System:
    • OS: Linux
    • architecture:
      • 64bit
      • ELF
    • processor: x86_64
    • python: 3.8.12
    • version: #74-Ubuntu SMP Tue Sep 17 17:06:04 UTC 2019

Additional context

BF-16 is a very important feature. It is usually more stable than fp16 and lightning should support it effectively (models that are pretrained with bf-16 should not be used with fp-16) 😃

cc @borda @tchaton @rohitgr7 @carmocca @justusschock @awaelchli @akihironitta

Issue Analytics

  • State:open
  • Created a year ago
  • Reactions:1
  • Comments:9 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
SeanNarencommented, Apr 5, 2022

@yuvalkirstain I’m glad it worked! Hopefully we’ll get the feature in soon 😃

1reaction
yuvalkirstaincommented, Apr 5, 2022

@SeanNaren Yes, doing so results in less memory on the GPU with identical results, thank you!

T5-XL (3B parameters) - inference on SQuAD dataset
8058MiB / 24268MiB (with model = model.bfloat16())
13196MiB / 24268MiB (without)

Regarding converting the pl_module internally for users, definitely, it makes more sense IMO that the trainer will take care of that rather than the model.

Read more comments on GitHub >

github_iconTop Results From Across the Web

.to(dtype) method with custom module and buffer of fixed ...
Hello, I made a custom module that need high precision (float64) ... and move it to the current device if needed during the...
Read more >
Mixed precision | TensorFlow Core
Overview. Mixed precision is the use of both 16-bit and 32-bit floating-point types in a model during training to make it run faster...
Read more >
MPS 16Bit Not Working correctly · Issue #78168 · pytorch ...
Describe the bug When i try to use half-precision together with the new mps backend, I get the following: >>> import torch >>>...
Read more >
Precision is lost / changed once data goes to Cython module
The first two steps to solving the problem are: Determine the specific integer type used by NumPy on your platform (e.g. int32 ,...
Read more >
Structure Overview — PyTorch-Metrics 0.11.0 documentation
Metrics are simple subclasses of Module and their metric states behave similar to ... You can always check the precision/dtype of the metric...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found