Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

If you OOM once you OOM forever

See original GitHub issue

Assume while using inductor that you have batch sizes b_1 and b_2 where b_1 < b2 and If you run b_1 first and if the model doesn’t OOM and then run b_2 then it OOMs.

The problem is if you run b_2 first and it OOMS and then if you run b_1 it does also OOM even though it’s not supposed to.

Potentially using dynamo.reset() might help until the OOM issues are all fixed https://github.com/pytorch/pytorch/blob/master/torch/_dynamo/__init__.py#L31

Issue Analytics

State:
Created a year ago
Comments:8 (8 by maintainers)

Top GitHub Comments

2reactions

yanboliangcommented, Oct 28, 2022

I can reproduce this locally. If I print out torch.cuda.memory_allocated() during each iteration, this is what dynamo looks like:

batch size =  32
239657472
batch size =  16
467977216
batch size =  8
694494720
batch size =  4
923125760
batch size =  3
1150851584
batch size =  2
1378315264
batch size =  1
1604861440

However, this is native PyTorch:

batch size =  32
239657472
batch size =  16
233169408
batch size =  8
239559168
batch size =  4
232202752
batch size =  3
240456192
batch size =  2
232194560
batch size =  1
241365504

It seems dynamo doesn’t free some memory after each iteration, so the memory keeps growing.

0reactions

yanboliangcommented, Nov 15, 2022

SG, let me check if I can reproduce with this repro.