Inductor - resnet18 - large batch size - CUDA error: an illegal memory access was encountered
See original GitHub issue🐛 Describe the bug
Repro:
import torch
import torch._dynamo
import torch._inductor
from torch._inductor import config
import logging
from torchvision import models
resnet18 = models.resnet18(weights=models.ResNet18_Weights.DEFAULT)
batch_size = 4096
device = "cuda"
resnet18 = resnet18.eval().to(device)
opt_resnet18 = torch._dynamo.optimize("inductor")(resnet18)
input = torch.randn((batch_size, 3, 224, 224)).to(device)
output = opt_resnet18(input)
print(output.shape)
This only happens when batch size is large.
Error logs
Traceback (most recent call last):
File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/utils.py", line 458, in preserve_rng_state
yield
File "/scratch/ybliang/work/repos/pytorch/torch/_inductor/compile_fx.py", line 202, in run
compiled_fn = cudagraphify_impl(model, new_inputs, static_input_idxs)
File "/scratch/ybliang/work/repos/pytorch/torch/_inductor/compile_fx.py", line 257, in cudagraphify_impl
model(list(static_inputs))
File "/tmp/torchinductor_ybliang/7q/c7qimro7rryowl6fbgxobggppym6ux4mwk4x5htmdqso66ydxlb3.py", line 691, in call
buf3 = empty_strided((4096, 64, 56, 56), (200704, 3136, 56, 1), device='cuda', dtype=torch.int64)
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/scratch/ybliang/work/repos/pytorch/debug/debug5.py", line 35, in <module>
output = opt_resnet18(input)
File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/eval_frame.py", line 138, in __call__
return self.forward(*args, **kwargs)
File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/eval_frame.py", line 135, in forward
return optimized_forward(*args, **kwargs)
File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/eval_frame.py", line 166, in _fn
return fn(*args, **kwargs)
File "/scratch/ybliang/work/repos/torchvision/torchvision/models/resnet.py", line 284, in forward
def forward(self, x: Tensor) -> Tensor:
File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/eval_frame.py", line 166, in _fn
return fn(*args, **kwargs)
File "/scratch/ybliang/work/repos/pytorch/functorch/_src/aot_autograd.py", line 870, in forward
return compiled_f(
File "/scratch/ybliang/work/repos/pytorch/functorch/_src/aot_autograd.py", line 861, in new_func
return compiled_fn(args)
File "/scratch/ybliang/work/repos/pytorch/functorch/_src/aot_autograd.py", line 230, in g
return f(*args)
File "/scratch/ybliang/work/repos/pytorch/functorch/_src/aot_autograd.py", line 489, in compiled_function
return CompiledFunction.apply(*remove_dupe_args(args))
File "/scratch/ybliang/work/repos/pytorch/functorch/_src/aot_autograd.py", line 450, in forward
fw_outs = call_func_with_args(
File "/scratch/ybliang/work/repos/pytorch/functorch/_src/aot_autograd.py", line 255, in call_func_with_args
out = normalize_as_list(f(args))
File "/scratch/ybliang/work/repos/pytorch/torch/_inductor/compile_fx.py", line 185, in run
return model(new_inputs)
File "/scratch/ybliang/work/repos/pytorch/torch/_inductor/compile_fx.py", line 202, in run
compiled_fn = cudagraphify_impl(model, new_inputs, static_input_idxs)
File "/scratch/ybliang/work/env/lib/python3.9/contextlib.py", line 137, in __exit__
self.gen.throw(typ, value, traceback)
File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/utils.py", line 462, in preserve_rng_state
torch.cuda.set_rng_state(cuda_rng)
File "/scratch/ybliang/work/repos/pytorch/torch/cuda/random.py", line 64, in set_rng_state
_lazy_call(cb)
File "/scratch/ybliang/work/repos/pytorch/torch/cuda/__init__.py", line 176, in _lazy_call
callable()
File "/scratch/ybliang/work/repos/pytorch/torch/cuda/random.py", line 62, in cb
default_generator.set_state(new_state_copy)
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Minified repro
No response
Issue Analytics
- State:
- Created a year ago
- Comments:7 (7 by maintainers)
Top Results From Across the Web
RuntimeError: CUDA error: an illegal memory access was ...
I met a strange illegal memory access error. It happens randomly without any regular ... I don't encounter it on smaller batch sizes....
Read more >PyTorch CUDA error: an illegal memory access was ...
It was partially said by the answer of the OP, but the problem under the hood with illegal memory access is that the...
Read more >CUDA error: an illegal memory access was encountered
I am getting a weird illegal memory access error whenever I try to train a FasterRCNN model with an image size of (1280,840,3)...
Read more >Resolving CUDA Being Out of Memory With Gradient ...
RuntimeError: CUDA error: out of memory ... And finally, when you're batch size is too high with respect to the dimensions of a...
Read more >1.1.7 PDF - PyTorch Lightning Documentation
CUDA error: an illegal memory access was encountered. The solution is likely setting a specific CUDA,. CUDNN, PyTorch version combination.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
No, this is real IMA, when there are more than INT_MAX elements triton doesn’t generate correct indexing, small(er) repro:
Note, we have an incorrect type annotation for xnumel here, but even after I make it
i64
I still get IMAI’ll be posting a couple of resnet18 memory fixes for non-cudagraphs later today.