Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

OOM at 48GB GPU mem on GPT-2 inference due to memory leak in directML.

See original GitHub issue

First of all thank you for all your work, it’s very exsiting to see windows/amd ML gap being closed! Issue is:

git clone https://github.com/openai/gpt-2.git
cd gpt-2
python -m pip install -r requirements.txt
python3 download_model.py 1558M
python src/interactive_conditional_samples.py 1558M

Given any text for inference, repetedly consumes all 48GB GPU mem (AMD Radeon VII 16GB + 32GB shared memory) and falls with:

  (0) Resource exhausted: OOM when allocating tensor with shape[1,48,2,25,455,64] and type float on /job:localhost/replica:0/task:0/device:DML:0 by allocator DmlAllocator
         [[node sample_sequence/while/concat (defined at F:\DSML\Soft\Anaconda\envs\directml\lib\site-packages\tensorflow_core\python\framework\ops.py:1762) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Full log and env in the comments below. Probably related with the resource releasing issue on

from ai_benchmark import AIBenchmark
benchmark = AIBenchmark(use_CPU=None, verbose_level=1)
results = benchmark.run()

which also OOM falls during execution. Mem for tensors is not released after runs and even after sess.close()

Issue Analytics

State:
Created 3 years ago
Comments:6 (2 by maintainers)

Top GitHub Comments

3reactions

PatriceVignolacommented, Sep 12, 2020

@Herobring , we just released tensorflow-directml 1.15.3.dev200911 with many improvements to the memory allocator. You can try it out and tell us how it goes!

Also, since we have now open-sourced our fork, new tensorflow-directml issues should be opened over here.

2reactions

PatriceVignolacommented, Aug 7, 2020

Thank you for your interest in making tensorflow-directml better @Herobring . This is an area of improvement we’re actively looking at and every data point is appreciated. Our end goal is to dramatically improve our memory usage patterns and make them closer to what people expect from CUDA or ROCm devices, but we’re not there yet as we have been focusing more on operator coverage in the past.

I’ll make sure to update this issue with a comment once we have made progress on this!

Top Results From Across the Web

Pytorch model training CPU Memory leak issue - Stack Overflow

Viewing log file from memory profiler, I find when column wise -= operation occurred, my CPU memory gradually increased until OOM killer ...

C | onnxruntime

Running a model with inputs. These inputs must be in CPU memory, not GPU. If the model has multiple outputs, user can specify...

The Best GPUs for Deep Learning in 2020 - Tim Dettmers

I will discuss CPUs vs GPUs, Tensor Cores, memory bandwidth, and the memory hierarchy of GPUs and how these relate to deep learning...

GPU memory leak when using tensorrt with onnx model

Description. GPU memory keeps increasing when running tensorrt inference in a for loop. Environment. TensorRT Version: 7.0.0.11

pytorch-memlab - PyPI

pytorch_memlab · Memory Profiler: A line_profiler style CUDA memory profiler with simple API. · Memory Reporter: A reporter to inspect tensors occupying the...