Increase in GPU memory usage with Pytorch-Lightning
See original GitHub issueOver the last week I have been porting my code on monocular depth estimation to Pytorch-Lightning, and everything is working perfectly. However, my models seem to require more GPU memory than before, to the point where I need to significantly decrease batch size at training time. These are the Trainer parameters I am using, and relevant versions:
FROM nvidia/cuda:10.1-devel-ubuntu18.04
ENV PYTORCH_VERSION=1.4.0
ENV TORCHVISION_VERSION=0.5.0
ENV CUDNN_VERSION=7.6.5.32-1+cuda10.1
ENV NCCL_VERSION=2.4.8-1+cuda10.1
ENV PYTORCH_LIGHTNING_VERSION=0.7.1
cfg.arch.gpus = 8
cfg.arch.num_nodes = 1
cfg.arch.num_workers = 8
cfg.arch.distributed_backend = 'ddp'
cfg.arch.amp_level = 'O0'
cfg.arch.precision = 32
cfg.arch.benchmark = True
cfg.arch.min_epochs = 1
cfg.arch.max_epochs = 50
cfg.arch.checkpoint_callback = False
cfg.arch.callbacks = []
cfg.arch.gradient_clip_val = 0.0
cfg.arch.accumulate_grad_batches = 1
cfg.arch.val_check_interval = 1.0
cfg.arch.check_val_every_n_epoch = 1
cfg.arch.num_sanity_val_steps = 0
cfg.arch.progress_bar_refresh_rate = 1
cfg.arch.fast_dev_run = False
cfg.arch.overfit_pct = 0.0
cfg.arch.train_percent_check = 1.0
cfg.arch.val_percent_check = 1.0
cfg.arch.test_percent_check = 1.0
Because of that (probably) I am having issues replicating my results, could you please advise on possible solutions? I will open-source the code as soon as I manage to replicate current results.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:7 (7 by maintainers)
Top Results From Across the Web
Why does pytorch lightning cause more GPU memory usage?
Assumign that my model uses 2G GPU memory, every batch data uses 3G GPU memory. Traning code will use 5G (2+3) GPU memory...
Read more >memory — PyTorch Lightning 1.8.5.post0 documentation
A dictionary in which the keys are device ids as integers and values are memory usage as integers in MB. Raises. FileNotFoundError –...
Read more >Memory Usage Keep Increasing During Training - vision
Hi guys, I trained my model using pytorch lightning. At the beginning, GPU memory usage is only 22%. However, after 900 steps, GPU...
Read more >7 Tips To Maximize PyTorch Performance
7 Tips To Maximize PyTorch Performance · Use workers in DataLoaders · Pin memory · Avoid CPU to GPU transfers or vice-versa ·...
Read more >7 Tips To Maximize PyTorch Performance | by William Falcon
Warning: The downside is that your memory usage will also increase (source). Pin memory. You know how sometimes your GPU ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

@jeremyjordan can we get that memory profiler? @vguizilini mind trying again from master?
Following up on this issue, is there anything else I should provide to facilitate debugging?