Does ignite save checkpoint take up video memory?
See original GitHub issue🐛 Bug description
Environment
- PyTorch Version (e.g., 1.4):1.7
- Ignite Version (e.g., 0.3.0):0.4.1
- OS (e.g., Linux):Linux
- How you installed Ignite (
conda
,pip
, source):pip - Python version:3.7
- Any other relevant information:
for i in range(n):
common.setup_common_training_handlers()
common.gen_save_best_models_by_val_score()
trainer.run()
During training, there will be a gradual increase in video memory, and eventually lead to cuda memory explosion
Please, Is this a problem caused by saving checkpoint or something else?
Issue Analytics
- State:
- Created 10 months ago
- Comments:16
Top Results From Across the Web
Ignite Persistent Store - under the hood - Apache Ignite
We can define checkpointing as a process of storing dirty pages from RAM on a disk, with results of consistent memory state is...
Read more >Checkpoint — PyTorch-Ignite v0.4.10 Documentation
This class can use specific save handlers to store on the disk or a cloud storage, etc. The Checkpoint handler (if used with...
Read more >How to resume learning? · Issue #2569 · pytorch/ignite - GitHub
Hi, support teams. This is my first time asking a question. I believe the following code will load the checkpoints. ... If the...
Read more >Distributed Training with Ignite on CIFAR10
This tutorial is a brief introduction on how you can do distributed training with Ignite on one or more CPUs, GPUs or TPUs....
Read more >PyTorch Lightning vs Ignite: What Are the Differences?
Lightning is a high-level python framework built on top of Pytorch. ... Saving the model as a PyTorch checkpoint; Converting the model to ......
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Okay, I get it, thank you very much. Thank you for being professional, responsible, patient, powerful, etc. Thank you!
I really want to provide it to you and solve this problem as soon as possible, but it is difficult for the code to be transmitted from the Intranet.
The way I am using it is as follows: