DataParallel is used by auto_model with single GPU
See original GitHub issue🐛 Bug description
I am not sure whether it is a bug or a feature:
The DataParallel
is being applied/patched by idist.auto_model
in the context of a single gpu (backend=None, nproc_per_node=1). What is the reason behind this choice? Does it bring any speed improvements?
The only way to prevent it is to set os.environ["CUDA_VISIBLE_DEVICES"] = "0"
for single-gpu contexts.
Environment
- PyTorch Version (e.g., 1.4): 1.7.1
- Ignite Version (e.g., 0.3.0): 0.4.8
- OS (e.g., Linux): Linux
- How you installed Ignite (
conda
,pip
, source): pip - Python version: 3.8
Issue Analytics
- State:
- Created 2 years ago
- Comments:9 (4 by maintainers)
Top Results From Across the Web
python - Converting a model that partially uses a DataParallel ...
I have large inputs of textual content for a single sample, ... be of shape (100x512) for one sample which cannot fit onto...
Read more >Single-Machine Model Parallel Best Practices - PyTorch
This post shows how to solve that problem by using model parallel, which, in contrast to DataParallel , splits a single model onto...
Read more >Data parallel on a single GPU for small models - fastai dev
I have written a post where I play around with training multiple models at the same time on a single GPU. Sort of...
Read more >Distribute your PyTorch model in less than 20 lines of code
Let's start with DataParallel, even if I won't use it in the example. This module works only on a single machine with multiple...
Read more >Computing Sentence Embeddings
Note: Even though we talk about sentence embeddings, you can use it also for shorter ... You can encode input texts with more...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
yes, we are using world_size to setup DDP. If world_size is defined and >1 then there is a distributed processing group and there is no point to use DP.
Here is the code: https://github.com/pytorch/ignite/blob/6d83dd72bb0bb7e655cd284789f367b46ab36a9e/ignite/distributed/auto.py#L201-L230
If there is no distributed processing group, but we have more then one GPUs available, we can use DP.
To enable distributed processing group, user can specify the backend in
idist.Parallel
and a group will be automatically created using all available process.In our case we leave the decision to the user. By launching a single process (
python main.py
) on a machine with N GPUs, he/she can either stay with a single process and use DP or spawn N sub-processes (and ignite internally creates a dist process group and thus auto_model will use DDP).Can we maybe check the world_size?
Is it justified to keep this use case, now that DDP is out and is faster than DP? I don’t fully understand the different use cases, so I might be wrong, in which case I understand that we should not change this behaviour