question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to improve training speed?

See original GitHub issue

Hello there,

Thanks for open sourcing DLRM. I tried to train from scratch using the ./bench/dlrm_s_criteo_terabyte.sh. While it was able to train well on all of my GPUs, I wasn’t able to use the GPUs fully. I only saw 7-8% utilization on A100-PCIE-40G GPUs.

Why is the utilization low? Is it because pipeline parallelism needs to be implemented? I think I saw a PR under review for this.

Are there any tips to run this workload faster by utilizing GPUs fully?

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:3
  • Comments:9 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
mnaumovfbcommented, Nov 1, 2021

Unfortunately you can not mix --num-workers and --memory-map flags, see issue https://github.com/facebookresearch/dlrm/issues/159.

1reaction
tim5gocommented, Oct 28, 2021

@rakshithvasudev I guess another option is to use Nvidia Merlin framework: https://github.com/NVIDIA-Merlin/HugeCTR

They provide very scalable distributed training and inference frameworks for recommender system.

Read more comments on GitHub >

github_iconTop Results From Across the Web

7 tricks to speed up the training of a neural network
7 tricks to speed up the training of a neural network · Multi-GPU training · Learning rate scaling · Cyclic Learning Rate Schedules...
Read more >
How to Run Faster: Speed Training Guide
Sample workout: Run one mile at a pace that's about 10 seconds slower per mile than your 5K race pace, then rest for...
Read more >
4 Speed Exercises to Build Power and Run Faster
Box Jump · Bulgarian Split Squat With Rotation · Deadlift · Kneeling Hip Flexor Stretch · Strength Training.
Read more >
Speed Training: How to Get Faster
Improving your running speed requires a training program that focuses on leg strength and power, with appropriate technique training to best utilize your ......
Read more >
How to speed up training of a Neural Network?
Provide details and share your research! ... Asking for help, clarification, or responding to other answers. Making statements based on opinion; ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found