Distributed processes not finding modules
See original GitHub issueI have a cluster of PCs on the same network and reading from the same mounted python directory.
I have a layout of code like this:
Project\
package\
data\
__init__.py
fitting.py
code2.py
etc...
output\
script.py
The workers are started in the same mounted directory as the project above. However, when I run script.py
, the worker spews out a load of unicode nonsense ,then the traceback below and then the worker apparently crashes. Apparently the module isn’t there
Traceback (most recent call last):
File "E:/Dropbox/Project/script.py", line 14, in <module>
fitter.save_results('output')
File "E:\Dropbox\Project\Package\fitting.py", line 410, in save_results
r = future.result()
File "C:\Anaconda\lib\site-packages\distributed\executor.py", line 96, in result
six.reraise(*result)
File "C:\Anaconda\lib\site-packages\distributed\core.py", line 68, in loads
return pickle.loads(x)
ImportError: No module named package.fitting
script.py
:
from distributed import Executor
from package import fitting
executor = Executor('127.0.0.1:8786')
T = prepare_data() # pandas dataframe
fitter = fitting.Fitter() # my fitting object
fitter.fit_mulitple(T, executor) # make futures in object
fitter.save_results() # save to disk on local machine
fitting.py
:
class Fitter(object):
....
def fit_multiple(data_frame, executor):
self.futures = [executor.submit(self.fit, series, append=False) for _, series in data_frame.iterrows()] # submit the `fit` function for each row (`fit` is pure when append=False but requires other modules in the package)
def save_results():
return [saving_function(f.result()) for f in self.futures] # generic save to hdf5
Issue Analytics
- State:
- Created 7 years ago
- Comments:7 (3 by maintainers)
Top Results From Across the Web
Dask "no module named xxxx" error - Stack Overflow
This problem could occur for 2 cases: the imports in the main code that calls the dask-distributed function was not found, or the...
Read more >Multiprocessing failed with Torch.distributed.launch module
Thx for reply, no background process was found and the port was always available. I have fixed a typo in my command, where...
Read more >Extending geoprocessing through Python modules
The process for building and distributing these toolboxes starts with the creation of the Python module. The module used in this example will...
Read more >Distributing Python Modules — Python 3.11.1 documentation
This guide covers the distribution part of the process. For a guide to installing other Python projects, refer to the installation guide.
Read more >dgl.multiprocessing — DGL 0.8.2post1 documentation
The API usage is exactly the same as the native module, so DGL does not provide ... Invoke the function in a single...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I am having the same issue, and I have run dask-scheduler and dask-workers in local machine to test. From the same directory, if the program is launched, it goes smoothly. But ask workers do not seem to find the module of another python file in the same directory.
Your workers don’t seem to be able to find the
package.fitting
package. Can you verify that a normal Python session started in the same place is able to find that package? Perhaps you can install it or put it on thePYTHONPATH
to help your workers out?The workers are just Python processes. They aren’t doing anything special with paths or your software environment. It’s mostly up to you to ensure that all of your workers have the same software environment.
How are you setting up your workers? If it is with
ssh
then you should beware that sometimes actions in.bashrc
or wherever you set up your user environment sometimes don’t persist to ssh sessions.