question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug/Feature] AttributeError: 'AmpOptimizerState' object has no attribute 'all_fp16_params'

See original GitHub issue

The following code:

import torch
import torch.nn as nn 
from apex import amp

print(torch.__version__)
t = torch.tensor([0])
opt = torch.optim.SGD([t], lr=0.01)
model = nn.Linear(1, 1).to("cuda")
model, opt = amp.initialize(model, opt, opt_level='O2')
epochs = 35
scheduler = torch.optim.lr_scheduler.StepLR(opt, gamma=0.1, step_size=3)
model.zero_grad()
for e in range(epochs):
    scheduler.step()
    opt.step()

gives the error

1.2.0a0+1252899
Selected optimization level O2:  FP16 training with FP32 batchnorm and FP32 master weights.

Defaults for this optimization level are:
enabled                : True
opt_level              : O2
cast_model_type        : torch.float16
patch_torch_functions  : False
keep_batchnorm_fp32    : True
master_weights         : True
loss_scale             : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled                : True
opt_level              : O2
cast_model_type        : torch.float16
patch_torch_functions  : False
keep_batchnorm_fp32    : True
master_weights         : True
loss_scale             : dynamic

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-10-a90bf57a066a> in <module>
     19 for e in range(epochs):
     20     scheduler.step()
---> 21     opt.step()

/home/project/tmp/pytorch/torch/optim/lr_scheduler.py in wrapper(*args, **kwargs)
     34             def wrapper(*args, **kwargs):
     35                 opt._step_count += 1
---> 36                 return func(*args, **kwargs)
     37             wrapper._with_counter = True
     38             return wrapper

/opt/conda/envs/pth-dev/lib/python3.7/site-packages/apex/amp/_process_optimizer.py in new_step(self)
    287         def new_step(self):
    288             retval = old_step()
--> 289             self._master_params_to_model_params()
    290             # Clear the master grads that wouldn't be zeroed by model.zero_grad()
    291             for param in self._amp_stash.all_fp32_from_fp16_params:

/opt/conda/envs/pth-dev/lib/python3.7/site-packages/apex/amp/_process_optimizer.py in _master_params_to_model_params(self)
    243     stash = self._amp_stash
    244     if multi_tensor_applier.available:
--> 245         if len(stash.all_fp16_params) > 0:
    246             multi_tensor_applier(
    247                 stash.multi_tensor_scale,

AttributeError: 'AmpOptimizerState' object has no attribute 'all_fp16_params'

I agree that the problem is that opt.zero_grad() haven’t called, but there are maybe some users who are using the above pattern, e.g.

https://gist.github.com/thomwolf/ac7a7da6b1888c2eeac8ac8b9b05d3d3#file-gradient_accumulation-py

HTH

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
vfdev-5commented, Jun 6, 2019

@ptrblck @mcarilli thank you for your answers! Piotrs’ answer works for me. Actually my real use-case is in a unit-test of lr scheduler and the optimizer, that’s why I do not have the loss function… When I discovered the error just wanted to make sure with you whether it is a bug or not 😃

1reaction
ptrblckcommented, Jun 6, 2019

Just for clarification: the training will work using model.zero_grad() in a standard training loop:

model = nn.Linear(1, 1).to("cuda")
opt = torch.optim.SGD(model.parameters(), lr=0.01)
model, opt = amp.initialize(model, opt, opt_level='O2')
epochs = 35
scheduler = torch.optim.lr_scheduler.StepLR(opt, gamma=0.1, step_size=3)

for e in range(epochs):
    model.zero_grad()
    scheduler.step()
    
    output = model(torch.randn(1, 1, device='cuda'))
    loss = output.mean()
    with amp.scale_loss(loss, opt) as scaled_loss:
        scaled_loss.backward()
    
    opt.step()

However, I might have misunderstood the use case, as you are passing t to the optimizer, while the parameters of the model don’t seem to be updated. Are you planning on somehow setting the gradients manually, @vfdev-5?

Read more comments on GitHub >

github_iconTop Results From Across the Web

'Contact' object has no attribute 'print_entry'? - Stack Overflow
Hi it's seem that you trying to call a method that not exist for the "Contact" class. person.print_entry(). the class Contact does not...
Read more >
AttributeError: 'PyroOptim' object has no attribute 'step' - Misc.
Hello, For my Pyro model, I've defined my scheduler as the following: scheduler = pyro.optim.Adam({'lr': 0.0055}). When I do:
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found