AttributeError occurred at amp.scale_loss
See original GitHub issueProblem
I tried to use amp, and this error occurred
Traceback (most recent call last):
File "train.py", line 75, in <module>
main(args)
File "train.py", line 54, in main
[model_checkpoint, metrics_logger])
File "/home/dwaydwaydway/adl/adl-hw1-example-code/src/base_predictor.py", line 75, in fit_dataset
log_train = self._run_epoch(dataloader, True)
File "/home/dwaydwaydway/adl/adl-hw1-example-code/src/base_predictor.py", line 182, in _run_epoch
with amp.scale_loss(batch_loss, self.optimizer) as scaled_loss:
File "/home/dwaydwaydway/anaconda3/lib/python3.7/contextlib.py", line 112, in __enter__
return next(self.gen)
File "/home/dwaydwaydway/anaconda3/lib/python3.7/site-packages/apex/amp/handle.py", line 59, in scale_loss
if not _amp_state.opt_properties.enabled:
AttributeError: 'AmpState' object has no attribute 'opt_properties'
> /home/dwaydwaydway/anaconda3/lib/python3.7/site-packages/apex/amp/handle.py(59)scale_loss()
-> if not _amp_state.opt_properties.enabled:
This is my code
def _run_epoch(self, dataloader, training):
# set model training/evaluation mode
self.model.train(training)
# run batches for train
loss = 0
# reset metric accumulators
for metric in self.metrics:
metric.reset()
if training:
iter_in_epoch = min(len(dataloader), self.max_iters_in_epoch)
description = 'training'
else:
iter_in_epoch = len(dataloader)
description = 'evaluating'
# run batches
trange = tqdm(enumerate(dataloader),
total=iter_in_epoch,
desc=description)
for i, batch in trange:
if training and i >= iter_in_epoch:
break
if training:
output, batch_loss = \
self._run_iter(batch, training)
batch_loss /= self.grad_accumulate_steps
# accumulate gradient - zero_grad
if i % self.grad_accumulate_steps == 0:
# TODO: call zero gradient here
self.optimizer.zero_grad()
# TODO: Call backward on `batch_loss` here.
########################################################
#Error Here
########################################################
with amp.scale_loss(batch_loss, self.optimizer) as scaled_loss:
scaled_loss.backward()
# batch_loss.backward()
# accumulate gradient - step
if (i + 1) % self.grad_accumulate_steps == 0:
# TODO: update gradient here
self.optimizer.step()
else:
with torch.no_grad():
output, batch_loss = \
self._run_iter(batch, training)
# accumulate loss and metric scores
loss += batch_loss.item()
for metric in self.metrics:
metric.update(output, batch)
trange.set_postfix(
loss=loss / (i + 1),
**{m.name: m.print_score() for m in self.metrics})
# calculate averate loss and metrics
loss /= iter_in_epoch
epoch_log = {}
epoch_log['loss'] = float(loss)
for metric in self.metrics:
score = metric.get_score()
print('{}: {} '.format(metric.name, score))
epoch_log[metric.name] = score
print('loss=%f\n' % loss)
return epoch_log
Environment
- Ubuntu 18.04
- python version : 3.7.1
- pytorch 1.0.1
- conda version : 4.6.8
- CUDA version: 10.0
- CUDA driver version: 410.48
- GPU: GeForce RTX 2070
Did I install it incorrectly?
$ git clone https://github.com/NVIDIA/apex.git
$ cd apex
$ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .
Issue Analytics
- State:
- Created 5 years ago
- Comments:6 (4 by maintainers)
Top Results From Across the Web
Automatic Mixed Precision package - torch.amp - PyTorch
Ordinarily, “automatic mixed precision training” with datatype of torch.float16 uses torch.autocast and torch.cuda.amp.GradScaler together, as shown in the ...
Read more >Source code for transformers.trainer - Hugging Face
For example, under ``DeepSpeed``, the inner model is wrapped in ``DeepSpeed`` ... ShardedGradScaler() if self.sharded_ddp is not None else torch.cuda.amp.
Read more >AttributeError: 'collections.OrderedDict' object has no attribute ...
I meet the same problem today when using the Deeplab. I think the main reason is that the output from deeplab is 'class...
Read more >AttributeError: module 'torch.cuda.amp' has no attribute 'autocast'
# Scales loss. Calls backward() on scaled loss to create scaled gradients. scaler.scale(loss).backward().
Read more >Mixed precision training on heterogenous graph - Questions
AttributeError Traceback (most recent call last) Input In [12], in <cell ... 34 scaler.scale(loss).backward() 35 scaler.step(optimizer) 36 ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Can I close this?
Yes, with dynamic loss scaling, it’s normal to see this message near the beginning of training and occasionally later in training. This is how amp adjusts the loss scale: amp checks gradients for infs and nans after each backward(), and if it finds any, amp skips the optimizer.step() for that iteration and reduces the loss scale for the next iteration.