question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Memory issue when training 1024 resolution

See original GitHub issue

I’m trying to train a 1024x1024 database on a V100 GPU. I tried both the tensorflow version and the pytorch version. Despite setting batch-gpu to 1, the tensorflow version always run out of system RAM(after the first tick, system ram total 51gb), and the pytorch version alway run out of cuda memory(before the first tick).

Here are my training settings:

python run_network.py --train --metrics 'none' --gpus 0 --batch-gpu 1 --resolution 1024 \
 --ganformer-default --expname art1 --dataset 1024art

Also, I always encounter the warning: tcmalloc: large alloc

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
BlueberryGincommented, Feb 17, 2022

Btw, I’m currently running fine with only saving output images and not saving attention maps

0reactions
doraradcommented, Feb 21, 2022

I’ll update the default options in accordance with that so the people won’t get memory issues. Thank you for the openning this issue!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Training a BERT-based model causes an OutOfMemory error ...
Check out this Out-of-memory issues section on their github page. ... where the first dimension 786432 = 768*1024 comes from concatenating ...
Read more >
Desktop heap limitation causes out of memory error
To resolve this problem, modify the desktop heap size by following these steps: Click Start, type regedit in the Start Search box, and...
Read more >
Unable to produce full resolution output from trained model ...
I have a trained model that is only able to accept inputs of size 1024 * 1024(which is resized from the original image)...
Read more >
How to fix out of memory errors by increasing available memory
How to fix out of memory errors by increasing available memory ... set the value by multiplying the desired memory level by 1024....
Read more >
Out-Of-Memory Prevention — Ray 2.2.0
How to use the memory monitor to detect and resolve memory issues ... def allocate_memory(): chunks = [] bits_to_allocate = 8 * 100...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found