question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DataParallel is used by auto_model with single GPU

See original GitHub issue

🐛 Bug description

I am not sure whether it is a bug or a feature:

The DataParallel is being applied/patched by idist.auto_model in the context of a single gpu (backend=None, nproc_per_node=1). What is the reason behind this choice? Does it bring any speed improvements?

The only way to prevent it is to set os.environ["CUDA_VISIBLE_DEVICES"] = "0" for single-gpu contexts.

Environment

  • PyTorch Version (e.g., 1.4): 1.7.1
  • Ignite Version (e.g., 0.3.0): 0.4.8
  • OS (e.g., Linux): Linux
  • How you installed Ignite (conda, pip, source): pip
  • Python version: 3.8

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:9 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
vfdev-5commented, Feb 3, 2022

Can we maybe check the world_size?

yes, we are using world_size to setup DDP. If world_size is defined and >1 then there is a distributed processing group and there is no point to use DP.

Here is the code: https://github.com/pytorch/ignite/blob/6d83dd72bb0bb7e655cd284789f367b46ab36a9e/ignite/distributed/auto.py#L201-L230

If there is no distributed processing group, but we have more then one GPUs available, we can use DP.

To enable distributed processing group, user can specify the backend in idist.Parallel and a group will be automatically created using all available process.

Is it justified to keep this use case, now that DDP is out and is faster than DP? I don’t fully understand the different use cases, so I might be wrong, in which case I understand that we should not change this behaviour

In our case we leave the decision to the user. By launching a single process (python main.py) on a machine with N GPUs, he/she can either stay with a single process and use DP or spawn N sub-processes (and ignite internally creates a dist process group and thus auto_model will use DDP).

0reactions
H4dr1encommented, Feb 3, 2022

Yes, it makes perfectly sense, but how ignite can know that you have nproc_per_node=1 ?

Can we maybe check the world_size?

In addition, there can be (old) cases when we would like to use DP : one process and use multiple GPUs.

Is it justified to keep this use case, now that DDP is out and is faster than DP? I don’t fully understand the different use cases, so I might be wrong, in which case I understand that we should not change this behaviour

Read more comments on GitHub >

github_iconTop Results From Across the Web

python - Converting a model that partially uses a DataParallel ...
I have large inputs of textual content for a single sample, ... be of shape (100x512) for one sample which cannot fit onto...
Read more >
Single-Machine Model Parallel Best Practices - PyTorch
This post shows how to solve that problem by using model parallel, which, in contrast to DataParallel , splits a single model onto...
Read more >
Data parallel on a single GPU for small models - fastai dev
I have written a post where I play around with training multiple models at the same time on a single GPU. Sort of...
Read more >
Distribute your PyTorch model in less than 20 lines of code
Let's start with DataParallel, even if I won't use it in the example. This module works only on a single machine with multiple...
Read more >
Computing Sentence Embeddings
Note: Even though we talk about sentence embeddings, you can use it also for shorter ... You can encode input texts with more...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found