distributed.joblib do not runs while joblib does or How to execute an arbitrary code on the worker right before sending him a taks for deserialization?
See original GitHub issueI’ve found that ipyparallel’s
dview.execute('import sys; sys.path.append("/shared/dir/with/source/code")')
and similar hacks using execute are quite helpful to setup environment before sending a task. This way things like deserialization won’t be messed up and will, probably, do the right thing.
Currently I’m dealing with a simple joblib program:
nloops=5
wavelet = makeRicker(nt=1024,dt=0.004,f=30)
pinv = [Perturb(nt=1024,dt=0.004,nOffsets=256,dx=25,motionType=2,srcDepth=10,rvrDepth=10) for m in range(nloops)]
misfit=np.zeros(nm)
with parallel_backend('dask.distributed', scheduler_host='scheduler:8786'):
pmisfit = Parallel(n_jobs=-1)(map(delayed(superf), range(nloops)))
So when I run it I get this:
/dist/anaconda/lib/python2.7/site-packages/distributed/protocol/pickle.pyc in loads()
57 def loads(x):
58 try:
---> 59 return pickle.loads(x)
60 except Exception:
61 logger.info("Failed to deserialize %s", x[:10000], exc_info=True)
ImportError: No module named spie1d
Missing module is in that /shared/dir/with/source/code
and appears in superf
cause it using pinv
and wavelet
global variables.
The very same example with multiprocessing
backend works fine with joblib
so I’m a little bit puzzled why distributed.joblib
is not working and that’s why I think that having a pre_task_accept closure in the workers class
or cluster.execute
method would be useful.
Could you please advise what should I do in cases like that?
P.S. the code in that /shared/dir/with/source/code
is not pure python - it has precompiled python modules (.so files - interfaces to some Fortran libs).
Issue Analytics
- State:
- Created 6 years ago
- Comments:7 (4 by maintainers)
Top GitHub Comments
The distributed joblib backend just runs on a dask-distributed cluster. Any of the normal ways of making python libraries available to the workers should work fine here. A few possibilities:
Client.run
and do what you’re doing above:Client.upload_file
to upload local python files to the worker, and place them on the worker’s path. See this stackoverflow question for more info.Hi @thoth291, I don’t know if you have had a chance to try John’s suggestion. If you do that and have more questions, I encourage you to open a new issue to discuss 😄