TorchInductor: "an illegal memory access" once and forever
See original GitHub issue🐛 Describe the bug
I hit this when I’m trying to reproduce #1778, not sure if it’s exactly the same issue so I open a new one.
Repro:
from typing import List
import torch
import torch._dynamo
import torch._inductor
from torch._inductor import config
import logging
from torchvision import models
import math
# torch._dynamo.config.log_level = logging.DEBUG
# torch._dynamo.config.verbose = True
# torch._inductor.config.debug = True
def convert_size(size_bytes):
if size_bytes == 0:
return "0B"
size_name = ("B", "KB", "MB", "GB", "TB", "PB", "EB", "ZB", "YB")
i = int(math.floor(math.log(size_bytes, 1024)))
p = math.pow(1024, i)
s = round(size_bytes / p, 2)
return "%s %s" % (s, size_name[i])
resnet18 = models.resnet18(weights=models.ResNet18_Weights.DEFAULT)
batch_size = 4096
# batch_size = 1024
device = "cuda"
resnet18 = resnet18.eval().to(device)
opt_resnet18 = torch._dynamo.optimize("inductor")(resnet18)
# opt_resnet18 = resnet18
count = 0
while batch_size >= 500 and count < 5:
try:
print("batch size = ", batch_size)
print("start: ", convert_size(torch.cuda.memory_allocated()))
input = torch.randn((batch_size, 3, 224, 224)).to(device)
output = opt_resnet18(input)
print(output.shape)
except RuntimeError as e:
print(e)
print("in runtime error: ", convert_size(torch.cuda.memory_allocated()))
print("end: ", convert_size(torch.cuda.memory_allocated()))
count += 1
batch_size = int(batch_size / 2)
When running native Pytorch, you got:
batch size = 4096
start: 44.69 MB
CUDA out of memory. Tried to allocate 3.06 GiB (GPU 0; 39.41 GiB total capacity; 36.03 GiB already allocated; 1.71 GiB free; 36.05 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
in runtime error: 36.03 GB
end: 2.34 GB
batch size = 2048
start: 2.34 GB
CUDA out of memory. Tried to allocate 392.00 MiB (GPU 0; 39.41 GiB total capacity; 37.56 GiB already allocated; 180.50 MiB free; 37.58 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
in runtime error: 37.56 GB
end: 1.19 GB
batch size = 1024
start: 1.19 GB
torch.Size([1024, 1000])
end: 21.2 GB
batch size = 512
start: 21.2 GB
torch.Size([512, 1000])
end: 10.63 GB
When running Dynamo + Inductor, you got:
batch size = 4096
start: 44.69 MB
CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
in runtime error: 32.2 GB
end: 2.34 GB
batch size = 2048
start: 2.34 GB
CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
in runtime error: 2.34 GB
end: 2.34 GB
batch size = 1024
start: 2.34 GB
CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
in runtime error: 2.34 GB
end: 2.34 GB
batch size = 512
start: 2.34 GB
CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
in runtime error: 2.34 GB
end: 2.34 GB
Actually batch size = 1024
is the first batch size w/o error during the search. But w/ inductor, it keeps failing even batch size is 1024 or less, I think we are using the same generated triton code, which doesn’t change as the input shape changes.
Error logs
No response
Minified repro
No response
Issue Analytics
- State:
- Created a year ago
- Comments:6 (6 by maintainers)
Top Results From Across the Web
Index computations are done in int32 even for large tensors ...
TorchInductor: "an illegal memory access " once and forever pytorch/torchdynamo#1819. Open. @Jokeren Jokeren added the bug label 2 weeks ago.
Read more >CUDA error: an illegal memory access was encountered
Hi, all. I am getting a weird illegal memory access error whenever I try to train a FasterRCNN model with an image size...
Read more >PyTorch CUDA error: an illegal memory access was ...
It was partially said by the answer of the OP, but the problem under the hood with illegal memory access is that the...
Read more >"ILLEGAL MEMORY ACCESS" - Daz 3D Forums
I'm seriously thinking about uninstall daz once and forever and buy poser. Could anyone explain this? (P.S. video cards are just unboxed.)
Read more >Cuda illegal memory access(kokkos) multiple MPI per GPU
I have encountered cuda illegal memory access(lib kokkos) when using multiple MPI per GPU. With KOKKOS, you should have only one MPI rank...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Cool, then I think we should catch OOM and reset env at Inductor level to prevent “OOM once and OOM forever”.
OOM is a recoverable error.