question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

TorchInductor: "an illegal memory access" once and forever

See original GitHub issue

🐛 Describe the bug

I hit this when I’m trying to reproduce #1778, not sure if it’s exactly the same issue so I open a new one.

Repro:

from typing import List
import torch
import torch._dynamo
import torch._inductor
from torch._inductor import config
import logging
from torchvision import models
import math

# torch._dynamo.config.log_level = logging.DEBUG
# torch._dynamo.config.verbose = True
# torch._inductor.config.debug = True

def convert_size(size_bytes):
   if size_bytes == 0:
       return "0B"
   size_name = ("B", "KB", "MB", "GB", "TB", "PB", "EB", "ZB", "YB")
   i = int(math.floor(math.log(size_bytes, 1024)))
   p = math.pow(1024, i)
   s = round(size_bytes / p, 2)
   return "%s %s" % (s, size_name[i])

resnet18 = models.resnet18(weights=models.ResNet18_Weights.DEFAULT)

batch_size = 4096
# batch_size = 1024
device = "cuda"

resnet18 = resnet18.eval().to(device)
opt_resnet18 = torch._dynamo.optimize("inductor")(resnet18)
# opt_resnet18 = resnet18

count = 0
while batch_size >= 500 and count < 5:
    try:
        print("batch size = ", batch_size)
        print("start: ", convert_size(torch.cuda.memory_allocated()))
        input = torch.randn((batch_size, 3, 224, 224)).to(device)
        output = opt_resnet18(input)
        print(output.shape)
    except RuntimeError as e:
        print(e)
        print("in runtime error: ", convert_size(torch.cuda.memory_allocated()))

    print("end: ", convert_size(torch.cuda.memory_allocated()))
    count += 1
    batch_size = int(batch_size / 2)

When running native Pytorch, you got:

batch size =  4096
start:  44.69 MB
CUDA out of memory. Tried to allocate 3.06 GiB (GPU 0; 39.41 GiB total capacity; 36.03 GiB already allocated; 1.71 GiB free; 36.05 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
in runtime error:  36.03 GB
end:  2.34 GB
batch size =  2048
start:  2.34 GB
CUDA out of memory. Tried to allocate 392.00 MiB (GPU 0; 39.41 GiB total capacity; 37.56 GiB already allocated; 180.50 MiB free; 37.58 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
in runtime error:  37.56 GB
end:  1.19 GB
batch size =  1024
start:  1.19 GB
torch.Size([1024, 1000])
end:  21.2 GB
batch size =  512
start:  21.2 GB
torch.Size([512, 1000])
end:  10.63 GB

When running Dynamo + Inductor, you got:

batch size =  4096
start:  44.69 MB
CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
in runtime error:  32.2 GB
end:  2.34 GB
batch size =  2048
start:  2.34 GB
CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
in runtime error:  2.34 GB
end:  2.34 GB
batch size =  1024
start:  2.34 GB
CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
in runtime error:  2.34 GB
end:  2.34 GB
batch size =  512
start:  2.34 GB
CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
in runtime error:  2.34 GB
end:  2.34 GB

Actually batch size = 1024 is the first batch size w/o error during the search. But w/ inductor, it keeps failing even batch size is 1024 or less, I think we are using the same generated triton code, which doesn’t change as the input shape changes.

Error logs

No response

Minified repro

No response

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
yanboliangcommented, Nov 1, 2022

Cool, then I think we should catch OOM and reset env at Inductor level to prevent “OOM once and OOM forever”.

1reaction
ngimelcommented, Nov 1, 2022

OOM is a recoverable error.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Index computations are done in int32 even for large tensors ...
TorchInductor: "an illegal memory access " once and forever pytorch/torchdynamo#1819. Open. @Jokeren Jokeren added the bug label 2 weeks ago.
Read more >
CUDA error: an illegal memory access was encountered
Hi, all. I am getting a weird illegal memory access error whenever I try to train a FasterRCNN model with an image size...
Read more >
PyTorch CUDA error: an illegal memory access was ...
It was partially said by the answer of the OP, but the problem under the hood with illegal memory access is that the...
Read more >
"ILLEGAL MEMORY ACCESS" - Daz 3D Forums
I'm seriously thinking about uninstall daz once and forever and buy poser. Could anyone explain this? (P.S. video cards are just unboxed.)
Read more >
Cuda illegal memory access(kokkos) multiple MPI per GPU
I have encountered cuda illegal memory access(lib kokkos) when using multiple MPI per GPU. With KOKKOS, you should have only one MPI rank...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found