question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Question: OneBitAdam on Eth/TCP

See original GitHub issue

I’m interested in training BERT using multiple nodes with multiple GPUs (Titan-V). Our cluster is Kubernetes-based and we dont have Infiniband interconnects but rather 10Gb eth. Using the provided Dockerfile (with up-to-date Deepspeed code) we’re unable to run it. It is missing mpiname, other functionalities from openmpi and mvapich launcher is missing. Does Onebitadam support such setup? if so could you please provide details on how to enable OneBitAdam on TCP/eth based networking? Thanks

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:12 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
peterizcommented, Oct 17, 2021

Hey @peteriz, were you able to try this out? I am setting up an academic lab and not sure if 10G ETH interconnects are sufficient.

Hi, I’m sorry but I didn’t have a chance. Its definitely on my todo list and I’ll update once I get to run such setup.

0reactions
HariharasudhanAScommented, Oct 11, 2021

Hey @peteriz, were you able to try this out? I am setting up an academic lab and not sure if 10G ETH interconnects are sufficient.

Read more comments on GitHub >

github_iconTop Results From Across the Web

1-bit Adam: Up to 5x less communication volume and up to 3.4 ...
1-bit Adam: Up to 5x less communication volume and up to 3.4x faster training · On 03/07/2022 we released 0/1 Adam, which is...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found