question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[trainer] new in pytorch: `torch.optim._multi_tensor` faster optimizers

See original GitHub issue

Back in September pytorch introduced torch.optim._multi_tensor https://github.com/pytorch/pytorch/pull/43507 which should be much more efficient for situations with lots of small feature tensors (transformers) and thus should show an appreciable speed up in training. If someone is interested in the progress of this project here is the stack to track: https://github.com/pytorch/pytorch/pull/48223

This feature is currently an alpha stage, so users can try to use it by simply replacing torch.optim with torch.optim._multi_tensor in HF Trainer or their own trainer.

Eventually it’ll replace torch.optim so there is nothing that we need to do otherwise.

@blefaudeux who alerted me to this improvement suggested it should have good speed ups for the DDP/Sharded DDP training.

If resources allow it’d be good to run some benchmarks. Please feel free to beat me to it.

Thanks to @blefaudeux for the heads up, and @izdeby for working on this enhancement and clarifying where things are at.

heads up to: @sgugger, @patrickvonplaten - nothing else that needs to be done.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:5
  • Comments:7 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
stas00commented, Jan 15, 2022

Yes, I was just about to revisit it.

edit: I thought you might have wanted to work on that, but the pytorch team asks to run a profiler on it and all, so I probably will look into testing it out again.

— original comment —

Do you want to take a lead on this experiment, @jaketae?

The new --optim HF Trainer just got merged, so you can quickly implement --optim adamw_torch_multi_tensor in the same way --optim adamw

You can use this tool for benchmarking https://github.com/huggingface/transformers/pull/14934 if it helps. I think it’s pretty stable now, I will propose to PR it.

1reaction
blefaudeuxcommented, Feb 2, 2021

you must have a really strange bottleneck in that test, neither the latest fairscale nor these are changing anything ? These optimizers are measurably faster in isolation, and sure enough we see a difference in fairscale CI, even on a dummy job / small model (see for instance, two last jobs)

Read more comments on GitHub >

github_iconTop Results From Across the Web

torch.optim — PyTorch 1.13 documentation
torch.optim is a package implementing various optimization algorithms. Most commonly used methods are already supported, and the interface is general enough ...
Read more >
optim — PyTorch Tutorials 1.12.1+cu102 documentation
The optim package defines many optimization algorithms that are commonly used for deep learning, including SGD+momentum, RMSProp, Adam, etc. import torch ...
Read more >
Performance Tuning Guide - PyTorch
Performance Tuning Guide is a set of optimizations and best practices which can accelerate training and inference of deep learning models in PyTorch....
Read more >
Optimizing Model Parameters — PyTorch Tutorials 1.13.0+ ...
Optimization Loop · The Train Loop - iterate over the training dataset and try to converge to optimal parameters. · The Validation/Test Loop...
Read more >
Training with PyTorch
In this video, we'll be adding some new tools to your inventory: ... Optimizers specified in the torch.optim package optimizer = torch.optim.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found