Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Memory issue when training 1024 resolution

See original GitHub issue

I’m trying to train a 1024x1024 database on a V100 GPU. I tried both the tensorflow version and the pytorch version. Despite setting batch-gpu to 1, the tensorflow version always run out of system RAM(after the first tick, system ram total 51gb), and the pytorch version alway run out of cuda memory(before the first tick).

Here are my training settings:

python run_network.py --train --metrics 'none' --gpus 0 --batch-gpu 1 --resolution 1024 \
 --ganformer-default --expname art1 --dataset 1024art

Also, I always encounter the warning: tcmalloc: large alloc

Issue Analytics

State:
Created 2 years ago
Comments:5 (2 by maintainers)

Top GitHub Comments

1reaction

BlueberryGincommented, Feb 17, 2022

Btw, I’m currently running fine with only saving output images and not saving attention maps

0reactions

doraradcommented, Feb 21, 2022

I’ll update the default options in accordance with that so the people won’t get memory issues. Thank you for the openning this issue!

Top Results From Across the Web

Training a BERT-based model causes an OutOfMemory error ...

Check out this Out-of-memory issues section on their github page. ... where the first dimension 786432 = 768*1024 comes from concatenating ...

Desktop heap limitation causes out of memory error

To resolve this problem, modify the desktop heap size by following these steps: Click Start, type regedit in the Start Search box, and...

Unable to produce full resolution output from trained model ...

I have a trained model that is only able to accept inputs of size 1024 * 1024(which is resized from the original image)...

How to fix out of memory errors by increasing available memory

How to fix out of memory errors by increasing available memory ... set the value by multiplying the desired memory level by 1024....

Out-Of-Memory Prevention — Ray 2.2.0

How to use the memory monitor to detect and resolve memory issues ... def allocate_memory(): chunks = [] bits_to_allocate = 8 * 100...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Memory issue when training 1024 resolution

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Does not support `keyword`

CLEVR pretrained model gives FID 22