Non-daemonic workers
See original GitHub issueRelated to #2142, but the solution doesn’t apply in my case. I have a use case for workers running in separate processes, but as non-daemons because the worker processes need to use multiprocessing. Here’s an example:
import torch
import torch.distributed as dist
import torchvision
import os
from distributed import Client, LocalCluster
def worker_fn(rank, world_size):
print('worker', rank)
os.environ['MASTER_ADDR'] = '127.0.0.1'
os.environ['MASTER_PORT'] = '8989'
dist.init_process_group(
backend=dist.Backend.NCCL,
rank=rank,
world_size=world_size,
)
print('initialized distributed', rank)
if rank == 0:
dataset = torchvision.datasets.MNIST(
'../data/',
train=True,
download=True,
)
dist.barrier()
if rank != 0:
dataset = torchvision.datasets.MNIST(
'../data/',
train=True,
download=False,
)
# load data, uses multiprocessing
loader = torch.utils.data.DataLoader(
dataset,
sampler=torch.utils.data.distributed.DistributedSampler(
dataset,
rank=rank,
num_replicas=world_size,
),
num_workers=2,
)
print('constructed data loader', rank)
# if cuda is available, initializes it as well
assert torch.cuda.is_available()
# do distributed training, but in this case it suffices to iterate
for x, y in loader:
pass
def main():
world_size = 2
cluster = LocalCluster(
n_workers=world_size,
processes=True,
resources={
'GPUS': 1, # don't allow two tasks to run on the same worker
},
)
cl = Client(cluster)
futs = []
for rank in range(world_size):
futs.append(
cl.submit(
worker_fn,
rank,
world_size,
resources={'GPUS': 1},
))
for f in futs:
f.result()
if __name__ == '__main__':
main()
If processes=True
, then we get an error about daemonic processes not being allowed to have children:
worker 0
worker 1
initialized distributed 1
initialized distributed 0
constructed data loader 0
constructed data loader 1
distributed.worker - WARNING - Compute Failed
Function: worker_fn
args: (0, 2)
kwargs: {}
Exception: AssertionError('daemonic processes are not allowed to have children',)
Traceback (most recent call last):
File "scratch.py", line 152, in <module>
main()
File "scratch.py", line 148, in main
f.result()
File "/private/home/calebh/miniconda3/envs/fairtask2/lib/python3.6/site-packages/distributed/client.py", line 227, in result
six.reraise(*result)
File "/private/home/calebh/miniconda3/envs/fairtask2/lib/python3.6/site-packages/six.py", line 692, in reraise
raise value.with_traceback(tb)
File "scratch.py", line 123, in worker_fn
for x, y in loader:
File "/private/home/calebh/miniconda3/envs/fairtask2/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 193, in __iter__
return _DataLoaderIter(self)
File "/private/home/calebh/miniconda3/envs/fairtask2/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 469, in __init__
w.start()
File "/private/home/calebh/miniconda3/envs/fairtask2/lib/python3.6/multiprocessing/process.py", line 103, in start
'daemonic processes are not allowed to have children'
AssertionError: daemonic processes are not allowed to have children
distributed.worker - WARNING - Compute Failed
Function: worker_fn
args: (1, 2)
kwargs: {}
Exception: AssertionError('daemonic processes are not allowed to have children',)
If processes=False
, we get stuck at distributed initialization.
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:12 (11 by maintainers)
Top Results From Across the Web
Python Process Pool non-daemonic? - Stack Overflow
Pool is just a wrapper function) and substitute your own multiprocessing.Process sub-class, which is always non-daemonic, to be used for the worker processes....
Read more >multiprocessing — Process-based parallelism — Python 3.11 ...
It has methods which allows tasks to be offloaded to the worker processes in a few different ways. For example: from multiprocessing import...
Read more >Python Process Pool non-daemonic - iTecNote
Pool class creates the worker processes in its __init__ method, makes them daemonic and starts them, and it is not possible to re-set...
Read more >Python Process Pool non-daemonic? - DevPress - CSDN
Process sub-class, which is always non-daemonic, to be used for the worker processes. Here's a full example of how to do this.
Read more >Reference Manual — Curio 1.2 documentation
When submitting work, you can either provide an async function and ... g.tasks, A list of all non-daemonic tasks managed by the group,...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@zhanghang1989 I recommend raising a new issue. I recommend not repeating your comment on multiple issues.
I am new to dask. Is that possible to set
--no-nanny
when usingdask-ssh
?