Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

"wall_clock_breakdown": true and overlap_comm: true

See original GitHub issue

Hi, I have a question regarding how deepspeed measures the communication time. I see that there is a timer that counts the time for allreduce as in But when I go into this function, I found it eventually goes into, which only performs cuda synchronization when overlap_comm=True. If I remember correctly, pytorch backward is a blocking operation, and the communication finishes before we enter self.timers(‘backward_allreduce_microstep’).start() .

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:7 (6 by maintainers)

github_iconTop GitHub Comments

szhengaccommented, Dec 24, 2020

Thanks. This explains. I think it may be helpful to add some comments around that allreduce call, otherwise it is misleading. And, since deepspeed uses its own launcher, I cannot directly just use nsys --profile deepspeed <args> for profiling. mpirun also has issue. Is there a way that I can use Nsight to profile the job? Recently, I came across an issue that backward is 26 times more expensive than the forward for the 10B model training. In general, the cost of the backward is only 2x times of that of the forward (it can be more if the workers communicate inside the backward, but it still shouldn’t be 26x times). This is strange as I am using 128 A100 GPUs with 4 AWS EFA NICs enabled (400Gb/s bandwidth). So the network should not be a problem and I need to use nsys to profile the training job to figure out why it happens. rank=0 time (ms) | forward_microstep: 70.96 | backward_microstep: 1805.11 | backward_inner_microstep: 1749.92 | backward_allreduce_microstep: 55.12 | step_microstep: 9.21 rank=0 time (ms) | forward: 70.94 | backward: 1805.08 | backward_inner: 1749.89 | backward_allreduce: 55.10 | step: 9.18
kehuanfengcommented, Dec 10, 2021

Nevermind, I have managed to fix the mpirun and used nsight to profile the training. I put the profiling result in a new issue #620.

@szhengac could you please share how you get nsight working with deepspeed launcher? The command I am using is as below, and the report doesn’t contain any cuda trace information. nsys profile --trace=cuda deepspeed

Read more comments on GitHub >

github_iconTop Results From Across the Web

ImprovingLife The Original Real Moving Gear Wall Clock ...
ImprovingLife The Original Real Moving Gear Wall Clock Vintage Industrial Oversized Rustic Farmhouse (24 inch (60cm),Gold Antique) ; Qty:1 ; Grey and White....
Read more >
Overlappers: They start a new relationship before breaking up ...
Let's be real, though: some people use knowledge of a possible imminent breakup to be 'open' to new possibilities.
Read more >
Clocks - Solvang Antiques - Pinterest
Solvang Antiques is a clock collector's dream come true, a rare opportunity for antique lovers to discover, enjoy, acquire and learn about antique...
Read more >
2022-23 MSHSAA Official Handbook
MSHSAA STANDARDIZED CALENDAR. WEEK. NO. 2022-2023. 2023-2024. 2024-2025. 2025-2026. Seasonal Allowance. 1. 7/3—7/9. 7/2 —7/8. 7/7—7/13.
Read more >
List of The Real Ghostbusters episodes - Wikipedia
The animated television series The Real Ghostbusters premiered on ABC on September 13, 1986. It continued airing weekly until the series conclusion on ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Post

No results found

github_iconTop Related Hashnode Post

No results found