Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Trainer batch size auto scaling

See original GitHub issue

🚀 Feature request

Since Trainer handles both batch_size and gradient_accumulation_steps it seems like it could detect some out-of-memory situations and handle those scenarios automatically.

Motivation

I’ve been experimenting with model search (model_type, vocab_size, num_hidden_layers, hidden_size) and it’s been somewhat difficult to manage the correct batch size for each variant. To avoid a process of trial & error and maintaining configuration tables, what I’ve been doing to overcome this is detecting memory exhaustion and adapting training arguments on the fly. It’s imperfect, but I wonder if there’s an official way to achieve this kind of behavior.

Your contribution

This is just a PoC, I’m sure there are several environments where this might be problematic. In particular CPU training on Linux is quite likely to trigger the OOM killer where the entire process is simply wiped from memory. Nevertheless, this strategy seems helpful at least some of the time.

class BatchAutoScaleTrainer(transformers.Trainer):
    ''' Try to detect application crashes due to CUDA/CPU OOMs and
        rescale batch size.  An antiprime batch_size gives best results.
        Inspired by PyTorchLightning/pytorch-lightning#1638
    '''
    def _shrink_bs(self):
        # GAS is used by both .train() and .eval() and we need to find a
        # suitable setting for both
        tbs = self.args.per_device_train_batch_size
        ebs = self.args.per_device_eval_batch_size
        gas = self.args.gradient_accumulation_steps
        for i in range(gas + 1, min(tbs, ebs) + 1):
            if tbs % i or ebs % i:
                continue
            self.args.per_device_train_batch_size = (tbs * gas) // i
            self.args.per_device_eval_batch_size = (ebs * gas) // i
            self.args.gradient_accumulation_steps = i
            return True
        return False
    def _is_oom(self, err):
        # shamelessly stolen from https://github.com/PyTorchLightning/pytorch-lightning/pull/1638/files#diff-5200c11792b86d6a07ea64820e126897aa2e3b7d3d295c92c19b141de6950afeR29-R32
        return len(err.args) == 1 and (
            "CUDA out of memory." in err.args[0]
         or "cuDNN error: CUDNN_STATUS_NOT_SUPPORTED." in err.args[0]
         or "DefaultCPUAllocator: can't allocate memory" in err.args[0]
         or "CUDA error: CUBLAS_STATUS_ALLOC_FAILED " in err.args[0]
        )
    def _auto_scale_batch_size(self, code):
        while True:
            try:
                gc.collect()
                if torch.cuda.is_available():
                    torch.cuda.empty_cache()
                return code()
            except RuntimeError as err:
                if self._is_oom(err) and self._shrink_bs():
                    continue
                raise
            assert(False) # bug in _shrink_bs() most likely
    def train(self, *args, **kwds):
        train = super().train
        return self._auto_scale_batch_size(
            lambda: train(*args, **kwds))
    def evaluate(self, *args, **kwds):
        evaluate = super().evaluate
        return self._auto_scale_batch_size(
            lambda: evaluate(*args, **kwds))

Any chance something like this might be integrated with the Trainer?

Issue Analytics

State:
Created 2 years ago
Comments:6 (5 by maintainers)

Top GitHub Comments

1reaction

sguggercommented, Oct 28, 2021

I am very nervous about adding that kind of feature of auto scaling to the Trainer. Note that the _is_oom test for instance will catch way more CUDA errors than the OOM: haivng the wrong number of labels in your model will trigger an error with CUBLAS_STATUS_ALLOC_FAILED on most environments.

In a notebook, the kernel is in an unrecoverable state after the try/except (and torch.cuda.empty_cache() does not help), so this wouldn’t work either.

So for now, my sense is that such a feature would be more painful for the user than beneficial and I would leave the tuning of the batch size to the user.

0reactions

tlbycommented, Dec 8, 2021

@LysandreJik Indeed, thanks for the note. rentruewang/koila#12 is a hopeful sign.

Top Results From Across the Web

Effective Training Techniques - PyTorch Lightning

Auto-scaling of batch size can be enabled to find the largest batch size that fits into memory. Large batch size often yields a...

PyTorch Lightning - auto scale batch size - YouTube

In this video, we give a short intro to Lightning's flag auto_scale_batch_size.To learn more about Lightning, please visit the official ...

PyTorch Lightning - Production

Auto batch scaling Automatically tries to find the largest batch size that fits into memory, before any training.

Performance and Scalability

Training larger and larger transformer models and deploying them to production comes with a range of challenges. During training your model can require...

PyTorch Lightning auto_scale_batch_size='power' does ...

... to try the automatic batch size finder. So I added the requested flag to the Trainer : trainer = pl.Trainer(default_root_dir=model_dir ...