Dev Observability
Product
Pricing
Docs
Resources
Blog
Company
Debug Wordle

question-mark

Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Migrate tensor parallelism code to use OSLO

See original GitHub issue

Is your feature request related to a problem? Please describe. Would be good to remove the megatron tensor parallelism code from NeoX, and OSLO currently has support for this, and a slightly nicer interface.

Describe the solution you’d like

Steps:

Rewrite all current modules as plain pytorch implementations, removing the mpu dependency from any internal code as much as possible. (so, anything that’s currently an mpu.[Column|Row]ParallelLinear or mpu.VocabParallelEmbedding should be replaced with its plain pytorch equivalent (nn.Linear / nn.Embedding respectively).
Write a mapping for neox modules, which oslo uses to handle parallelization.
Ensure backwards compatibility

Issue Analytics

State:
Created 2 years ago
Comments:6 (6 by maintainers)

Top GitHub Comments

6reactions

hyunwoongkocommented, Mar 1, 2022

I will actively support this work.

0reactions

hyunwoongkocommented, Mar 10, 2022

@sdtblck Did you check my branch?

Read more comments on GitHub >

Top Results From Across the Web

Efficient Training on Multiple GPUs - Hugging Face

Switching from a single GPU to multiple requires some form of parallelism as the work needs to be distributed. There are several techniques...

Migrate to TensorFlow 2

Learn how to migrate your TensorFlow code from TensorFlow 1.x to TensorFlow 2. It may take a little work to convert your code, ......

Parallel Binary Code Analysis - arXiv

The core of this work is a new parallel analysis for constructing control flow graphs (CFG construction), which constructs functions, basic.

Scaling deep learning workloads with PyTorch / XLA and ...

In our model code, we use PyTorch / XLA's optimizer_step(optimizer) to calculate the gradients and initiate this synchronous update.

Package List — Spack 0.20.0.dev0 documentation

This is a list of things you can install using Spack. It is automatically generated based on the packages in this Spack version....

Top Related Medium Post

No results found

Top Related StackOverflow Question

No results found

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Top Related Reddit Thread

No results found

Top Related Hackernoon Post

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Top Related Hashnode Post

No results found

Unable to install dependencies: No matching distribution found for triton==0.4.2

Parallel all reduce communication and backprop