Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

AttributeError occurred at amp.scale_loss

See original GitHub issue

Problem

I tried to use amp, and this error occurred

Traceback (most recent call last):
  File "train.py", line 75, in <module>
    main(args)
  File "train.py", line 54, in main
    [model_checkpoint, metrics_logger])
  File "/home/dwaydwaydway/adl/adl-hw1-example-code/src/base_predictor.py", line 75, in fit_dataset
    log_train = self._run_epoch(dataloader, True)
  File "/home/dwaydwaydway/adl/adl-hw1-example-code/src/base_predictor.py", line 182, in _run_epoch
    with amp.scale_loss(batch_loss, self.optimizer) as scaled_loss:
  File "/home/dwaydwaydway/anaconda3/lib/python3.7/contextlib.py", line 112, in __enter__
    return next(self.gen)
  File "/home/dwaydwaydway/anaconda3/lib/python3.7/site-packages/apex/amp/handle.py", line 59, in scale_loss
    if not _amp_state.opt_properties.enabled:
AttributeError: 'AmpState' object has no attribute 'opt_properties'
> /home/dwaydwaydway/anaconda3/lib/python3.7/site-packages/apex/amp/handle.py(59)scale_loss()
-> if not _amp_state.opt_properties.enabled:

This is my code

def _run_epoch(self, dataloader, training):
        # set model training/evaluation mode
        self.model.train(training)

        # run batches for train
        loss = 0

        # reset metric accumulators
        for metric in self.metrics:
            metric.reset()

        if training:
            iter_in_epoch = min(len(dataloader), self.max_iters_in_epoch)
            description = 'training'
        else:
            iter_in_epoch = len(dataloader)
            description = 'evaluating'

        # run batches
        trange = tqdm(enumerate(dataloader),
                      total=iter_in_epoch,
                      desc=description)
        for i, batch in trange:
            if training and i >= iter_in_epoch:
                break

            if training:
                output, batch_loss = \
                    self._run_iter(batch, training)

                batch_loss /= self.grad_accumulate_steps

                # accumulate gradient - zero_grad
                if i % self.grad_accumulate_steps == 0:
                    # TODO: call zero gradient here
                    self.optimizer.zero_grad()

                # TODO: Call backward on `batch_loss` here.
########################################################
#Error Here
########################################################
                with amp.scale_loss(batch_loss, self.optimizer) as scaled_loss:
                    scaled_loss.backward()
#                 batch_loss.backward()
                
                # accumulate gradient - step
                if (i + 1) % self.grad_accumulate_steps == 0:
                    # TODO: update gradient here
                    self.optimizer.step()
            else:
                with torch.no_grad():
                    output, batch_loss = \
                        self._run_iter(batch, training)

            # accumulate loss and metric scores
            loss += batch_loss.item()
            for metric in self.metrics:
                metric.update(output, batch)
            trange.set_postfix(
                loss=loss / (i + 1),
                **{m.name: m.print_score() for m in self.metrics})

        # calculate averate loss and metrics
        loss /= iter_in_epoch

        epoch_log = {}
        epoch_log['loss'] = float(loss)
        for metric in self.metrics:
            score = metric.get_score()
            print('{}: {} '.format(metric.name, score))
            epoch_log[metric.name] = score
        print('loss=%f\n' % loss)
        return epoch_log

Environment

Ubuntu 18.04
python version : 3.7.1
pytorch 1.0.1
conda version : 4.6.8
CUDA version: 10.0
CUDA driver version: 410.48
GPU: GeForce RTX 2070

Did I install it incorrectly?

$ git clone https://github.com/NVIDIA/apex.git
$ cd apex
$ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .

Issue Analytics

State:
Created 5 years ago
Comments:6 (4 by maintainers)

Top GitHub Comments

1reaction

mcarillicommented, Apr 4, 2019

Can I close this?

1reaction

mcarillicommented, Mar 22, 2019

Yes, with dynamic loss scaling, it’s normal to see this message near the beginning of training and occasionally later in training. This is how amp adjusts the loss scale: amp checks gradients for infs and nans after each backward(), and if it finds any, amp skips the optimizer.step() for that iteration and reduces the loss scale for the next iteration.

Top Results From Across the Web

Automatic Mixed Precision package - torch.amp - PyTorch

Ordinarily, “automatic mixed precision training” with datatype of torch.float16 uses torch.autocast and torch.cuda.amp.GradScaler together, as shown in the ...

Source code for transformers.trainer - Hugging Face

For example, under ``DeepSpeed``, the inner model is wrapped in ``DeepSpeed`` ... ShardedGradScaler() if self.sharded_ddp is not None else torch.cuda.amp.

AttributeError: 'collections.OrderedDict' object has no attribute ...

I meet the same problem today when using the Deeplab. I think the main reason is that the output from deeplab is 'class...

AttributeError: module 'torch.cuda.amp' has no attribute 'autocast'

# Scales loss. Calls backward() on scaled loss to create scaled gradients. scaler.scale(loss).backward().

Mixed precision training on heterogenous graph - Questions

AttributeError Traceback (most recent call last) Input In [12], in <cell ... 34 scaler.scale(loss).backward() 35 scaler.step(optimizer) 36 ...