Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug/Feature] AttributeError: 'AmpOptimizerState' object has no attribute 'all_fp16_params'

See original GitHub issue

The following code:

import torch
import torch.nn as nn 
from apex import amp

print(torch.__version__)
t = torch.tensor([0])
opt = torch.optim.SGD([t], lr=0.01)
model = nn.Linear(1, 1).to("cuda")
model, opt = amp.initialize(model, opt, opt_level='O2')
epochs = 35
scheduler = torch.optim.lr_scheduler.StepLR(opt, gamma=0.1, step_size=3)
model.zero_grad()
for e in range(epochs):
    scheduler.step()
    opt.step()

gives the error

1.2.0a0+1252899
Selected optimization level O2:  FP16 training with FP32 batchnorm and FP32 master weights.

Defaults for this optimization level are:
enabled                : True
opt_level              : O2
cast_model_type        : torch.float16
patch_torch_functions  : False
keep_batchnorm_fp32    : True
master_weights         : True
loss_scale             : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled                : True
opt_level              : O2
cast_model_type        : torch.float16
patch_torch_functions  : False
keep_batchnorm_fp32    : True
master_weights         : True
loss_scale             : dynamic

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-10-a90bf57a066a> in <module>
     19 for e in range(epochs):
     20     scheduler.step()
---> 21     opt.step()

/home/project/tmp/pytorch/torch/optim/lr_scheduler.py in wrapper(*args, **kwargs)
     34             def wrapper(*args, **kwargs):
     35                 opt._step_count += 1
---> 36                 return func(*args, **kwargs)
     37             wrapper._with_counter = True
     38             return wrapper

/opt/conda/envs/pth-dev/lib/python3.7/site-packages/apex/amp/_process_optimizer.py in new_step(self)
    287         def new_step(self):
    288             retval = old_step()
--> 289             self._master_params_to_model_params()
    290             # Clear the master grads that wouldn't be zeroed by model.zero_grad()
    291             for param in self._amp_stash.all_fp32_from_fp16_params:

/opt/conda/envs/pth-dev/lib/python3.7/site-packages/apex/amp/_process_optimizer.py in _master_params_to_model_params(self)
    243     stash = self._amp_stash
    244     if multi_tensor_applier.available:
--> 245         if len(stash.all_fp16_params) > 0:
    246             multi_tensor_applier(
    247                 stash.multi_tensor_scale,

AttributeError: 'AmpOptimizerState' object has no attribute 'all_fp16_params'

I agree that the problem is that opt.zero_grad() haven’t called, but there are maybe some users who are using the above pattern, e.g.

https://gist.github.com/thomwolf/ac7a7da6b1888c2eeac8ac8b9b05d3d3#file-gradient_accumulation-py

HTH

Issue Analytics

State:
Created 4 years ago
Comments:5 (2 by maintainers)

Top GitHub Comments

1reaction

vfdev-5commented, Jun 6, 2019

@ptrblck @mcarilli thank you for your answers! Piotrs’ answer works for me. Actually my real use-case is in a unit-test of lr scheduler and the optimizer, that’s why I do not have the loss function… When I discovered the error just wanted to make sure with you whether it is a bug or not 😃

1reaction

ptrblckcommented, Jun 6, 2019

Just for clarification: the training will work using model.zero_grad() in a standard training loop:

model = nn.Linear(1, 1).to("cuda")
opt = torch.optim.SGD(model.parameters(), lr=0.01)
model, opt = amp.initialize(model, opt, opt_level='O2')
epochs = 35
scheduler = torch.optim.lr_scheduler.StepLR(opt, gamma=0.1, step_size=3)

for e in range(epochs):
    model.zero_grad()
    scheduler.step()
    
    output = model(torch.randn(1, 1, device='cuda'))
    loss = output.mean()
    with amp.scale_loss(loss, opt) as scaled_loss:
        scaled_loss.backward()
    
    opt.step()

However, I might have misunderstood the use case, as you are passing t to the optimizer, while the parameters of the model don’t seem to be updated. Are you planning on somehow setting the gradients manually, @vfdev-5?