question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

AttributeError occurred at amp.scale_loss

See original GitHub issue

Problem

I tried to use amp, and this error occurred

Traceback (most recent call last):
  File "train.py", line 75, in <module>
    main(args)
  File "train.py", line 54, in main
    [model_checkpoint, metrics_logger])
  File "/home/dwaydwaydway/adl/adl-hw1-example-code/src/base_predictor.py", line 75, in fit_dataset
    log_train = self._run_epoch(dataloader, True)
  File "/home/dwaydwaydway/adl/adl-hw1-example-code/src/base_predictor.py", line 182, in _run_epoch
    with amp.scale_loss(batch_loss, self.optimizer) as scaled_loss:
  File "/home/dwaydwaydway/anaconda3/lib/python3.7/contextlib.py", line 112, in __enter__
    return next(self.gen)
  File "/home/dwaydwaydway/anaconda3/lib/python3.7/site-packages/apex/amp/handle.py", line 59, in scale_loss
    if not _amp_state.opt_properties.enabled:
AttributeError: 'AmpState' object has no attribute 'opt_properties'
> /home/dwaydwaydway/anaconda3/lib/python3.7/site-packages/apex/amp/handle.py(59)scale_loss()
-> if not _amp_state.opt_properties.enabled:

This is my code

def _run_epoch(self, dataloader, training):
        # set model training/evaluation mode
        self.model.train(training)

        # run batches for train
        loss = 0

        # reset metric accumulators
        for metric in self.metrics:
            metric.reset()

        if training:
            iter_in_epoch = min(len(dataloader), self.max_iters_in_epoch)
            description = 'training'
        else:
            iter_in_epoch = len(dataloader)
            description = 'evaluating'

        # run batches
        trange = tqdm(enumerate(dataloader),
                      total=iter_in_epoch,
                      desc=description)
        for i, batch in trange:
            if training and i >= iter_in_epoch:
                break

            if training:
                output, batch_loss = \
                    self._run_iter(batch, training)

                batch_loss /= self.grad_accumulate_steps

                # accumulate gradient - zero_grad
                if i % self.grad_accumulate_steps == 0:
                    # TODO: call zero gradient here
                    self.optimizer.zero_grad()

                # TODO: Call backward on `batch_loss` here.
########################################################
#Error Here
########################################################
                with amp.scale_loss(batch_loss, self.optimizer) as scaled_loss:
                    scaled_loss.backward()
#                 batch_loss.backward()
                
                # accumulate gradient - step
                if (i + 1) % self.grad_accumulate_steps == 0:
                    # TODO: update gradient here
                    self.optimizer.step()
            else:
                with torch.no_grad():
                    output, batch_loss = \
                        self._run_iter(batch, training)

            # accumulate loss and metric scores
            loss += batch_loss.item()
            for metric in self.metrics:
                metric.update(output, batch)
            trange.set_postfix(
                loss=loss / (i + 1),
                **{m.name: m.print_score() for m in self.metrics})

        # calculate averate loss and metrics
        loss /= iter_in_epoch

        epoch_log = {}
        epoch_log['loss'] = float(loss)
        for metric in self.metrics:
            score = metric.get_score()
            print('{}: {} '.format(metric.name, score))
            epoch_log[metric.name] = score
        print('loss=%f\n' % loss)
        return epoch_log

Environment

  • Ubuntu 18.04
  • python version : 3.7.1
  • pytorch 1.0.1
  • conda version : 4.6.8
  • CUDA version: 10.0
  • CUDA driver version: 410.48
  • GPU: GeForce RTX 2070

Did I install it incorrectly?

$ git clone https://github.com/NVIDIA/apex.git
$ cd apex
$ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
mcarillicommented, Apr 4, 2019

Can I close this?

1reaction
mcarillicommented, Mar 22, 2019

Yes, with dynamic loss scaling, it’s normal to see this message near the beginning of training and occasionally later in training. This is how amp adjusts the loss scale: amp checks gradients for infs and nans after each backward(), and if it finds any, amp skips the optimizer.step() for that iteration and reduces the loss scale for the next iteration.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Automatic Mixed Precision package - torch.amp - PyTorch
Ordinarily, “automatic mixed precision training” with datatype of torch.float16 uses torch.autocast and torch.cuda.amp.GradScaler together, as shown in the ...
Read more >
Source code for transformers.trainer - Hugging Face
For example, under ``DeepSpeed``, the inner model is wrapped in ``DeepSpeed`` ... ShardedGradScaler() if self.sharded_ddp is not None else torch.cuda.amp.
Read more >
AttributeError: 'collections.OrderedDict' object has no attribute ...
I meet the same problem today when using the Deeplab. I think the main reason is that the output from deeplab is 'class...
Read more >
AttributeError: module 'torch.cuda.amp' has no attribute 'autocast'
# Scales loss. Calls backward() on scaled loss to create scaled gradients. scaler.scale(loss).backward().
Read more >
Mixed precision training on heterogenous graph - Questions
AttributeError Traceback (most recent call last) Input In [12], in <cell ... 34 scaler.scale(loss).backward() 35 scaler.step(optimizer) 36 ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found