Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Memory estimation inconsistent with actual GPU memory utilization

See original GitHub issue

Describe the bug Memory estimation inconsistent with actual GPU memory utilization

To Reproduce

I am using a simple UNet with 2 layers (same as here).
The input size is (1, 1, 4096, 3328)

Expected behavior When forwarding an image of size (1, 1, 4096, 3328) in testing mode, i.e., model.eval() on, the reported GPU memory is approximatly 15GB:

Screenshot from 2022-07-01 10-44-52

However, torchinfo.summary reports 50GB, even though eval is passed as argument:

summary(model, input_size=(1, 1, 4096, 3328), mode='eval', device=device)

Screenshot from 2022-07-01 10-46-48

Issue Analytics

State:
Created a year ago
Comments:7 (3 by maintainers)

Top GitHub Comments

1reaction

mert-kurttutancommented, Oct 13, 2022

Actually, gradient is not calculated in any of the modes since torch.no_grad is used for both train and eval mode, see forward_pass function in torchinfo.py. I also checked. Gpu memory usage remains the same when changing the mode.

1reaction

rodrigovimieirocommented, Sep 25, 2022

@devrimcavusoglu I don’t have enough GPU memory for the model. That’s why I was trying to estimate it