Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to debug with multi-gpu training

See original GitHub issue

Hi, I am trying to debug multi-gpu training with Pycharm. But the multi-gpu training directly called the module torch.distributed.launch. I didn’t find out how to debug it on Pycharm. I configure it this way pycharmdebug But it threw out error ‘no module named tools/train.py’

Could you please help, I am trying to understand the codes by debugging.

Issue Analytics

State:
Created 4 years ago
Reactions:2
Comments:6 (1 by maintainers)

Top GitHub Comments

4reactions

hellockcommented, Jul 15, 2019

I suggest using single gpu for debugging. It is hard to debug in the distributed training mode.

2reactions

Dejan1969commented, Mar 15, 2020

Solution: Module name: /home/user/.local/lib/python3.6/site-packages/torch/distributed/launch.py or wherever your module.distributed.launch is…

Top Results From Across the Web

Debugging - Hugging Face

For multi-GPU training it requires DDP ( torch.distributed.launch ). This feature can be used with any nn.Module -based model. If you start ...

PyTorch 101, Part 4: Memory Management and Using Multiple ...

This article covers PyTorch's advanced GPU management features, how to optimise memory usage and best practises for debugging memory errors.

Profiling TensorFlow Multi GPU Multi Node Training Job with ...

This notebook walks you through creating a TensorFlow training job with the SageMaker Debugger profiling feature enabled. It will create a multi GPU...

Testing Multi GPU training on a Single GPU - PyTorch Lightning

I can only submit jobs to a cluster node (4-8GPUs) and can't use the cluster for debugging. The code runs fine on a...

What's good practice for debugging distributed training? - Reddit

I think you answered your own question in terms of debugging with a single process + single GPU, then adjusting the parameters to...