Memory issue when training 1024 resolution
See original GitHub issueI’m trying to train a 1024x1024 database on a V100 GPU. I tried both the tensorflow version and the pytorch version. Despite setting batch-gpu to 1, the tensorflow version always run out of system RAM(after the first tick, system ram total 51gb), and the pytorch version alway run out of cuda memory(before the first tick).
Here are my training settings:
python run_network.py --train --metrics 'none' --gpus 0 --batch-gpu 1 --resolution 1024 \
--ganformer-default --expname art1 --dataset 1024art
Also, I always encounter the warning:
tcmalloc: large alloc
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (2 by maintainers)
Top Results From Across the Web
Training a BERT-based model causes an OutOfMemory error ...
Check out this Out-of-memory issues section on their github page. ... where the first dimension 786432 = 768*1024 comes from concatenating ...
Read more >Desktop heap limitation causes out of memory error
To resolve this problem, modify the desktop heap size by following these steps: Click Start, type regedit in the Start Search box, and...
Read more >Unable to produce full resolution output from trained model ...
I have a trained model that is only able to accept inputs of size 1024 * 1024(which is resized from the original image)...
Read more >How to fix out of memory errors by increasing available memory
How to fix out of memory errors by increasing available memory ... set the value by multiplying the desired memory level by 1024....
Read more >Out-Of-Memory Prevention — Ray 2.2.0
How to use the memory monitor to detect and resolve memory issues ... def allocate_memory(): chunks = [] bits_to_allocate = 8 * 100...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Btw, I’m currently running fine with only saving output images and not saving attention maps
I’ll update the default options in accordance with that so the people won’t get memory issues. Thank you for the openning this issue!