How to improve training speed?
See original GitHub issueHello there,
Thanks for open sourcing DLRM. I tried to train from scratch using the ./bench/dlrm_s_criteo_terabyte.sh
. While it was able to train well on all of my GPUs, I wasn’t able to use the GPUs fully. I only saw 7-8% utilization on A100-PCIE-40G GPUs.
Why is the utilization low? Is it because pipeline parallelism needs to be implemented? I think I saw a PR under review for this.
Are there any tips to run this workload faster by utilizing GPUs fully?
Issue Analytics
- State:
- Created 2 years ago
- Reactions:3
- Comments:9 (3 by maintainers)
Top Results From Across the Web
7 tricks to speed up the training of a neural network
7 tricks to speed up the training of a neural network · Multi-GPU training · Learning rate scaling · Cyclic Learning Rate Schedules...
Read more >How to Run Faster: Speed Training Guide
Sample workout: Run one mile at a pace that's about 10 seconds slower per mile than your 5K race pace, then rest for...
Read more >4 Speed Exercises to Build Power and Run Faster
Box Jump · Bulgarian Split Squat With Rotation · Deadlift · Kneeling Hip Flexor Stretch · Strength Training.
Read more >Speed Training: How to Get Faster
Improving your running speed requires a training program that focuses on leg strength and power, with appropriate technique training to best utilize your ......
Read more >How to speed up training of a Neural Network?
Provide details and share your research! ... Asking for help, clarification, or responding to other answers. Making statements based on opinion; ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Unfortunately you can not mix
--num-workers
and--memory-map
flags, see issue https://github.com/facebookresearch/dlrm/issues/159.@rakshithvasudev I guess another option is to use Nvidia Merlin framework: https://github.com/NVIDIA-Merlin/HugeCTR
They provide very scalable distributed training and inference frameworks for recommender system.