question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Migrate tensor parallelism code to use OSLO

See original GitHub issue

Is your feature request related to a problem? Please describe. Would be good to remove the megatron tensor parallelism code from NeoX, and OSLO currently has support for this, and a slightly nicer interface.

Describe the solution you’d like

Steps:

  • Rewrite all current modules as plain pytorch implementations, removing the mpu dependency from any internal code as much as possible. (so, anything that’s currently an mpu.[Column|Row]ParallelLinear or mpu.VocabParallelEmbedding should be replaced with its plain pytorch equivalent (nn.Linear / nn.Embedding respectively).
  • Write a mapping for neox modules, which oslo uses to handle parallelization.
  • Ensure backwards compatibility

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

6reactions
hyunwoongkocommented, Mar 1, 2022

I will actively support this work.

0reactions
hyunwoongkocommented, Mar 10, 2022

@sdtblck Did you check my branch?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Efficient Training on Multiple GPUs - Hugging Face
Switching from a single GPU to multiple requires some form of parallelism as the work needs to be distributed. There are several techniques...
Read more >
Migrate to TensorFlow 2
Learn how to migrate your TensorFlow code from TensorFlow 1.x to TensorFlow 2. It may take a little work to convert your code, ......
Read more >
Parallel Binary Code Analysis - arXiv
The core of this work is a new parallel analysis for constructing control flow graphs (CFG construction), which constructs functions, basic.
Read more >
Scaling deep learning workloads with PyTorch / XLA and ...
In our model code, we use PyTorch / XLA's optimizer_step(optimizer) to calculate the gradients and initiate this synchronous update.
Read more >
Package List — Spack 0.20.0.dev0 documentation
This is a list of things you can install using Spack. It is automatically generated based on the packages in this Spack version....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found