Simple training script with loss cause an exception
See original GitHub issueThe following steps causes a runtime exception NotImplementedError
. But, I am sorry in advance if I make mistakes. Here is a repro.
"eagar"
or "aot_eagar
" instead of “inductor
” does not work, too.
% $ git log | head -1
commit 8c9f11ca6f2789f06785e7606bdb99f087bcc73a
% pip list | grep torch
torch 1.13.0.dev20220927+cpu
torch-mlir 20220927.609
torchdynamo 1.13.0.dev0 /mnt/xvdc/DeepTools/DD2/torchdynamo
torchvision 0.14.0.dev20220927+cpu
% cat torchdynamo_loss.py
import torch
import torchdynamo
class Net(torch.nn.Module):
def __init__(self):
super().__init__()
def forward(self, x, y):
x = torch.bmm(x, y)
x = torch.flatten(x, 1)
return x
@torchdynamo.optimize("inductor")
def training_iter_fn(batch, model, optimizer):
optimizer.zero_grad()
out = model(**batch)
lossFn = torch.nn.CrossEntropyLoss()
target = torch.tensor([0, 7])
loss = lossFn(out, target)
loss.backward()
optimizer.step()
return loss
net = Net()
input1 = torch.randn(2, 1, 4)
input2 = torch.randn(2, 4, 8, requires_grad=True)
optimizer = torch.optim.Adam([input2], lr=0.1)
opt_training_iter_fn = training_iter_fn
batch = {"x":input1, "y":input2}
loss = opt_training_iter_fn(batch, net, optimizer)
print(input2.cpu())
% python torchdynamo_loss.py
[2022-10-12 06:42:13,178] torchdynamo.variables.torch: [WARNING] Profiler will be ignored
[2022-10-12 06:42:13,187] torchdynamo.symbolic_convert: [WARNING] Graph break: Tensor.backward from user code at File "/home/ishizaki/torchdynamo/tmp/torchdynamo_loss.py", line 26, in training_iter_fn
loss.backward()
[2022-10-12 06:42:14,922] torchdynamo.symbolic_convert: [WARNING] Graph break: inline with __closure__ from user code at File "/home/ishizaki/torchdynamo/tmp/torchdynamo_loss.py", line 27, in <graph break in training_iter_fn>
optimizer.step()
[2022-10-12 06:42:14,934] torchdynamo.symbolic_convert: [WARNING] Graph break: inline in skipfiles: _fn /home/ishizaki/torchdynamo/torchdynamo/eval_frame.py from user code at File "/home/ishizaki/torchdynamo/.venv/lib/python3.9/site-packages/torch/optim/adam.py", line 178, in step
self._cuda_graph_capture_health_check()
[2022-10-12 06:42:14,948] torchdynamo.convert_frame: [ERROR] WON'T CONVERT <graph break in step> /home/ishizaki/torchdynamo/.venv/lib/python3.9/site-packages/torch/optim/adam.py line 178
due to:
Traceback (most recent call last):
File "/home/ishizaki/torchdynamo/torchdynamo/variables/base.py", line 146, in as_python_constant
raise NotImplementedError(f"{self} is not a constant")
NotImplementedError: TensorVariable() is not a constant
from user code:
File "/home/ishizaki/torchdynamo/.venv/lib/python3.9/site-packages/torch/optim/adam.py", line 209, in <graph break in step>
state = self.state[p]
Set torchdynamo.config.verbose=True for more information
==========
[2022-10-12 06:42:14,961] torchdynamo.symbolic_convert: [WARNING] Graph break: Tensor.item from user code at File "/home/ishizaki/torchdynamo/.venv/lib/python3.9/site-packages/torch/optim/adam.py", line 300, in adam
func(params,
File "/home/ishizaki/torchdynamo/.venv/lib/python3.9/site-packages/torch/optim/adam.py", line 395, in _single_tensor_adam
step = step_t.item()
[2022-10-12 06:42:14,980] torchdynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-10-12 06:42:14,980] torchinductor.compile_fx: [WARNING] Aot Autograd is not safe to run, so falling back to eager
[2022-10-12 06:42:15,000] torchdynamo.optimizations.training: [WARNING] Unable to use Aot Autograd because of presence of mutation
[2022-10-12 06:42:15,000] torchinductor.compile_fx: [WARNING] Aot Autograd is not safe to run, so falling back to eager
tensor([[[ 1.2348, -0.7126, 1.2387, 2.0989, -0.1772, 0.4236, -0.0968,
-0.3471],
[ 0.6295, -1.5259, 0.6826, 1.0028, -0.4873, 0.0893, -0.2904,
0.1033],
[ 1.7367, 0.7345, 1.5238, -1.9054, -1.9447, 0.4717, 0.1325,
-0.6108],
[ 0.5933, 0.7319, 1.5816, -0.3573, 0.3974, -1.0648, -2.0550,
0.6247]],
[[ 0.4393, 0.4159, -0.4996, 0.3288, -0.9796, -0.0822, -0.6735,
0.4048],
[-1.1754, -0.2157, 1.0433, -0.3781, 0.5304, -2.7421, -1.1731,
-0.6624],
[ 0.3439, -0.4731, 0.4820, -0.1286, -0.1511, 0.4843, 1.1936,
1.2146],
[ 1.9118, -1.4318, -0.6035, 0.0142, 0.8406, 1.2690, -0.2417,
0.4326]]], requires_grad=True)
Issue Analytics
- State:
- Created a year ago
- Comments:7 (5 by maintainers)
Top Results From Across the Web
Training successful but submitting function throws exception
Describe the bug. Training a simple NN on sagemaker via tensorflow estimator. things go ok if I dont use callbacks to create checkpoints....
Read more >How to Handle Exception in SoapUI Groovy Scripts
It can happen due to many reasons such as invalid data, network connection loss, trying open files that are unavailable, accessing invalid class ......
Read more >Pytorch: IndexError: index out of range in self. How to solve?
`loss` is a Tensor containing a # single value; the `. item()` function just returns the Python value # from the tensor. total_loss...
Read more >How to fix Python KeyError Exceptions in simple steps?
A detailed guide to Errors and Exceptions in Python. ... The condition does not raise an exception; rather it terminates the program.
Read more >Error Handling in VBA - My Online Training Hub
Understand how Excel VBA generates errors, how to control what Excel does when an error occurs, and how to write your own error...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thanks for providing the context, it makes sense! @soumith Currently we use
defaultdict
andTensor
as key across all optimizers(includingAdam
in this case). We have to specialize theTensor
’s value during compilation if its was used as key. Right now we did specialization only if it’snn.Parameter
, which works well for majority cases. If we want to support optimizing over anyTensor
withrequries_grad=True
, we just need to relax this constriction. But the downside is it may take more memory if we specialize these non-parameter tensors. As you mentioned, this is a bit rarer, so we need to trade off. Anyway, I’ll send a PR soon.Yeah, this code works well w/o torchdynamo. Let me double-check Adam API.