examples/imagenet still fails
See original GitHub issueSee https://github.com/pytorch/torchdynamo/issues/1687 for original context.
Now it’s failing on latest pytorch master
, first I ran into a parallel compile issue for which I put up a patch: https://github.com/pytorch/pytorch/pull/87174
After that applied, it still fails with a different CUDAGraphs error.
$ python main.py --gpu 0 /home/soumith/dataset/imagenet
/home/soumith/code/vision/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension:
warn(f"Failed to load image Python extension: {e}")
/home/soumith/code/examples/imagenet/main.py:100: UserWarning: You have chosen a specific GPU. This will completely disable data parallelism.
warnings.warn('You have chosen a specific GPU. This will completely '
Use GPU: 0 for training
=> creating model 'resnet18'
make_fallback(aten.unfold): a decomposition exists, we should switch to it
make_fallback(aten.unfold_backward): a decomposition exists, we should switch to it
Traceback (most recent call last):
File "/home/soumith/code/examples/imagenet/main.py", line 513, in <module>
main()
File "/home/soumith/code/examples/imagenet/main.py", line 121, in main
main_worker(args.gpu, ngpus_per_node, args)
File "/home/soumith/code/examples/imagenet/main.py", line 280, in main_worker
train(train_loader, model, criterion, optimizer, epoch, device, args)
File "/home/soumith/code/examples/imagenet/main.py", line 327, in train
output = model(images)
File "/home/soumith/code/pytorch/torch/_dynamo/eval_frame.py", line 137, in __call__
return self.forward(*args, **kwargs)
File "/home/soumith/code/pytorch/torch/_dynamo/eval_frame.py", line 134, in forward
return optimized_forward(*args, **kwargs)
File "/home/soumith/code/pytorch/torch/_dynamo/eval_frame.py", line 157, in _fn
return fn(*args, **kwargs)
File "/home/soumith/code/vision/torchvision/models/resnet.py", line 284, in forward
def forward(self, x: Tensor) -> Tensor:
File "/home/soumith/code/pytorch/torch/_dynamo/eval_frame.py", line 157, in _fn
return fn(*args, **kwargs)
File "/home/soumith/code/pytorch/functorch/_src/aot_autograd.py", line 856, in forward
return compiled_f(
File "/home/soumith/code/pytorch/functorch/_src/aot_autograd.py", line 847, in new_func
return compiled_fn(args)
File "/home/soumith/code/pytorch/functorch/_src/aot_autograd.py", line 230, in g
return f(*args)
File "/home/soumith/code/pytorch/functorch/_src/aot_autograd.py", line 475, in compiled_function
return CompiledFunction.apply(*remove_dupe_args(args))
File "/home/soumith/code/pytorch/functorch/_src/aot_autograd.py", line 442, in forward
fw_outs = call_func_with_args(
File "/home/soumith/code/pytorch/functorch/_src/aot_autograd.py", line 255, in call_func_with_args
out = normalize_as_list(f(args))
File "/home/soumith/code/pytorch/torch/_inductor/compile_fx.py", line 179, in run
return model(new_inputs_to_cuda)
File "/home/soumith/code/pytorch/torch/_inductor/compile_fx.py", line 196, in run
compiled_fn = cudagraphify_impl(model, new_inputs, static_input_idxs)
File "/home/soumith/code/pytorch/torch/_inductor/compile_fx.py", line 254, in cudagraphify_impl
model(list(static_inputs))
File "/tmp/torchinductor_soumith/yz/cyzv2xzkmvwv33lxnmvd7lvgj4sq7l75r2jp76hekwqzumu2ovoo.py", line 1791, in call
assert_size_stride(buf56, (256, 128, 28, 28), (100352, 1, 3584, 128))
AssertionError: expected size 128==128, stride 784==1 at dim=1
Issue Analytics
- State:
- Created a year ago
- Comments:9 (9 by maintainers)
Top Results From Across the Web
The Fall of ImageNet - Towards Data Science
Since the ImageNet challenge isn't about recognizing people, it's about recognizing objects, the team decided to push forward with blurring the ...
Read more >Problems with ImageNet and its Solutions - Open Data Science
The least important problem with ImageNet is that sometimes the ground truth labels are bad. My favorite example is an image labeled ...
Read more >the error when I run the example for the imagenet #544 - GitHub
When I tried to run the model for the example/imagenet, I encounter such error.So could you tell me how to solve the problem?...
Read more >Distilling Model Failures as Directions in Latent Space
We demonstrate how to distill patterns of model errors as directions in a latent space.
Read more >ImageNet Benchmark (Image Classification) | Papers With Code
Rank Model Top 1 Accuracy Number of params Year
1 CoCa (finetuned) 91.0% 2100M 2022
2 Model soups (BASIC‑L) 90.98% 2440M 2022
3 Model soups (ViT‑G/14)...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
actually, Minifier works – I didn’t know that I should run the
minifier_launcher.py
.Here’s the minified repro:
okay, so I think I figured it out. My install doesn’t have any CuDNN.
In this case, channels_last is not respected. Your PR doesn’t check for this case, I think