Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

examples/imagenet still fails

See original GitHub issue

See https://github.com/pytorch/torchdynamo/issues/1687 for original context.

Now it’s failing on latest pytorch master, first I ran into a parallel compile issue for which I put up a patch: https://github.com/pytorch/pytorch/pull/87174

After that applied, it still fails with a different CUDAGraphs error.

$ python main.py --gpu 0 /home/soumith/dataset/imagenet
/home/soumith/code/vision/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension:
  warn(f"Failed to load image Python extension: {e}")
/home/soumith/code/examples/imagenet/main.py:100: UserWarning: You have chosen a specific GPU. This will completely disable data parallelism.
  warnings.warn('You have chosen a specific GPU. This will completely '
Use GPU: 0 for training
=> creating model 'resnet18'
make_fallback(aten.unfold): a decomposition exists, we should switch to it
make_fallback(aten.unfold_backward): a decomposition exists, we should switch to it
Traceback (most recent call last):
  File "/home/soumith/code/examples/imagenet/main.py", line 513, in <module>
    main()
  File "/home/soumith/code/examples/imagenet/main.py", line 121, in main
    main_worker(args.gpu, ngpus_per_node, args)
  File "/home/soumith/code/examples/imagenet/main.py", line 280, in main_worker
    train(train_loader, model, criterion, optimizer, epoch, device, args)
  File "/home/soumith/code/examples/imagenet/main.py", line 327, in train
    output = model(images)
  File "/home/soumith/code/pytorch/torch/_dynamo/eval_frame.py", line 137, in __call__
    return self.forward(*args, **kwargs)
  File "/home/soumith/code/pytorch/torch/_dynamo/eval_frame.py", line 134, in forward
    return optimized_forward(*args, **kwargs)
  File "/home/soumith/code/pytorch/torch/_dynamo/eval_frame.py", line 157, in _fn
    return fn(*args, **kwargs)
  File "/home/soumith/code/vision/torchvision/models/resnet.py", line 284, in forward
    def forward(self, x: Tensor) -> Tensor:
  File "/home/soumith/code/pytorch/torch/_dynamo/eval_frame.py", line 157, in _fn
    return fn(*args, **kwargs)
  File "/home/soumith/code/pytorch/functorch/_src/aot_autograd.py", line 856, in forward
    return compiled_f(
  File "/home/soumith/code/pytorch/functorch/_src/aot_autograd.py", line 847, in new_func
    return compiled_fn(args)
  File "/home/soumith/code/pytorch/functorch/_src/aot_autograd.py", line 230, in g
    return f(*args)
  File "/home/soumith/code/pytorch/functorch/_src/aot_autograd.py", line 475, in compiled_function
    return CompiledFunction.apply(*remove_dupe_args(args))
  File "/home/soumith/code/pytorch/functorch/_src/aot_autograd.py", line 442, in forward
    fw_outs = call_func_with_args(
  File "/home/soumith/code/pytorch/functorch/_src/aot_autograd.py", line 255, in call_func_with_args
    out = normalize_as_list(f(args))
  File "/home/soumith/code/pytorch/torch/_inductor/compile_fx.py", line 179, in run
    return model(new_inputs_to_cuda)
  File "/home/soumith/code/pytorch/torch/_inductor/compile_fx.py", line 196, in run
    compiled_fn = cudagraphify_impl(model, new_inputs, static_input_idxs)
  File "/home/soumith/code/pytorch/torch/_inductor/compile_fx.py", line 254, in cudagraphify_impl
    model(list(static_inputs))
  File "/tmp/torchinductor_soumith/yz/cyzv2xzkmvwv33lxnmvd7lvgj4sq7l75r2jp76hekwqzumu2ovoo.py", line 1791, in call
    assert_size_stride(buf56, (256, 128, 28, 28), (100352, 1, 3584, 128))
AssertionError: expected size 128==128, stride 784==1 at dim=1

Issue Analytics

State:
Created a year ago
Comments:9 (9 by maintainers)

Top GitHub Comments

1reaction

soumithcommented, Oct 18, 2022

actually, Minifier works – I didn’t know that I should run the minifier_launcher.py.

Here’s the minified repro:

import torch
from torch import tensor, device
import torch.fx as fx
from torch._dynamo.testing import rand_strided
from math import inf
from torch.fx.experimental.proxy_tensor import make_fx

# torch version: 1.14.0a0+git240bba7
# torch cuda version: 11.6
# torch git version: 240bba7ac85b6163c7c75a168019cd0b6d1c6aa0


# CUDA Info:
# nvcc: NVIDIA (R) Cuda compiler driver
# Copyright (c) 2005-2022 NVIDIA Corporation
# Built on Tue_Mar__8_18:18:20_PST_2022
# Cuda compilation tools, release 11.6, V11.6.124
# Build cuda_11.6.r11.6/compiler.31057947_0

# GPU Hardware Info:
# NVIDIA GeForce RTX 3090 : 1


from torch.nn import *
class Repro(torch.nn.Module):
    def __init__(self):
        super().__init__()



    def forward(self, arg21_1, relu_4):
        convolution_7 = torch.ops.aten.convolution.default(relu_4, arg21_1, None, [2, 2], [0, 0], [1, 1], False, [0, 0], 1);  relu_4 = arg21_1 = None
        return (convolution_7,)

args = [((128, 64, 1, 1), (64, 1, 1, 1), torch.float32, 'cuda'), ((256, 64, 56, 56), (200704, 3136, 56, 1), torch.float32, 'cuda')]
args = [rand_strided(sh, st, dt, dev) for (sh, st, dt, dev) in args]
mod = make_fx(Repro().to(device="cuda"))(*args)

from torch._inductor.compile_fx import compile_fx_inner
from torch._dynamo.debug_utils import same_two_models

compiled = compile_fx_inner(mod, args)
compiled(args)

0reactions

soumithcommented, Oct 19, 2022

okay, so I think I figured it out. My install doesn’t have any CuDNN.

print(torch.__config__.show())
PyTorch built with:
  - GCC 9.4
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2022.1-Product Build 20220311 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.6
  - NVCC architecture flags: -gencode;arch=compute_86,code=sm_86
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.6, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.14.0, USE_CUDA=ON, USE_CUDNN=OFF, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

In this case, channels_last is not respected. Your PR doesn’t check for this case, I think

Top Results From Across the Web

The Fall of ImageNet - Towards Data Science

Since the ImageNet challenge isn't about recognizing people, it's about recognizing objects, the team decided to push forward with blurring the ...

Problems with ImageNet and its Solutions - Open Data Science

The least important problem with ImageNet is that sometimes the ground truth labels are bad. My favorite example is an image labeled ...

the error when I run the example for the imagenet #544 - GitHub

When I tried to run the model for the example/imagenet, I encounter such error.So could you tell me how to solve the problem?...

Distilling Model Failures as Directions in Latent Space

We demonstrate how to distill patterns of model errors as directions in a latent space.

ImageNet Benchmark (Image Classification) | Papers With Code

Rank Model Top 1 Accuracy Number of params Year 1 CoCa (finetuned) 91.0% 2100M 2022 2 Model soups (BASIC‑L) 90.98% 2440M 2022 3 Model soups (ViT‑G/14)...