Memory error - Optimization to increase batch size
See original GitHub issueWhen training with samples of size 256x256 pixels, a batch size over 32 causes a memory error from cuda. We have to find a way to optimize the process, in order to increase the batch size.
RunTimeError: cuda runtime error (2) : out of memory at /opt/conda/ .../THCStorage.cu:58
NOTE : May be specific to our (GC HPC) computing environment
Issue Analytics
- State:
- Created 5 years ago
- Comments:9 (4 by maintainers)
Top Results From Across the Web
How to Increase Training Performance Through Memory ...
The most basic example of GPU memory optimization is increasing your batch size to increase the memory utilization up to as close to...
Read more >How to maximize GPU utilization by finding the right batch size
Increasing batch size is a straightforward technique to boost GPU usage, though it is not always successful. The gradient of the batch size...
Read more >Role of Batch Size in optimization | by Yash Upadhyay - Medium
I have tried to summarize how batch size affects the learning in the ... amount of memory usage of the hardware scales up...
Read more >How to Control the Stability of Training Neural Networks With ...
Batch size controls the accuracy of the estimate of the error gradient when training neural networks. Batch, Stochastic, and Minibatch gradient ...
Read more >How To Fit a Bigger Model and Train It Faster - Hugging Face
Even when we set the batch size to 1 and use gradient accumulation we can still run out of memory when working with...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Validation of Checkpointed results
Results when using checkpointing are slightly different from those of the original unetsmall model because CudNN has non-deterministic kernels. I ran tests using suggestions from pytorch discussions https://discuss.pytorch.org/t/non-reproducible-result-with-gpu/1831 and https://discuss.pytorch.org/t/deterministic-non-deterministic-results-with-pytorch/9087.
Using the same sample files, I ran train_model.py twice using the unetsmall model (batch_size = 32) and I ran it once using the checkpointed_unet model (batch_size = 50). Then, I classified some images with the resulting models.
The settings to try to get reproducible results were set as follows at the beginning of the code:
Also, in the DataLoaders, the parameters used for the instantiation had
num_workers = 0
andshuffle = False
Running the original unetsmall configuration without checkpoints 2 times yielded two slightly different results. Here are some examples of the results obtained when running image_classification.py on one of the training images with each trained model. Sections if the images that weren’t classified were left white.Please note that the configurations and the number of samples weren’t set to yield optimal results. Verifying reproducibility was the goal of these tests. The number of training samples was set to the number of samples produced during the samples creation.
I think that the results of the checkpointed_unet are similar enough to the unetsmall’s results for us to consider that it is a good memory and time optimised version of our unetsmall net architecture. I have added it as a model choice for our program.
Throughout my tests, I observed that the models produced by training are more accurate when the randoms aren’t seeded. The checkpointed_unet, observationally, seems to be more affected by this then the unetsmall.
Using checkpointing in the unetsmall net increases the speed of training. Tests were performed using the following parameters:
Best results
Using checkpoints in the net design does seem to affect the results of the training. Tests were done on the original and on the checkpointed nets while setting the random seed to 7 and the models outputted gave similar but slightly different results. In the first test, the original algorithm gave results closer to the ground truth. In the second tests, the checkpointed version of the net yielded better results.