question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

dask-ssh fails if Python is installed in different paths across the workers

See original GitHub issue

I tested distributed on a very simple office “cluster” : My laptop and office server. Both are on Ubuntu 14.04, but I installed Python differently on both. In my laptop I did a user install of miniconda and in the server I installed anaconda as root. The corresponding python paths are :

10.1.0.115 --> ‘/home/aguirre/miniconda2/bin/python’ (My laptop) 10.1.0.118 --> ‘/opt/anaconda2/bin/python’ (Server)

If I manually launch dask-worker and dask-scheduler, everything works fine. But if I try dask-ssh, it does not work :

$ dask-ssh 10.1.0.{115,118}
---------------------------------------------------------------
                 Dask.distributed v1.11.0

Worker nodes:
  0: 10.1.0.115
  1: 10.1.0.118

scheduler node: 10.1.0.115:8786
---------------------------------------------------------------


[ scheduler 10.1.0.115:8786 ] : /home/aguirre/miniconda2/bin/python -m distributed.cli.dask_scheduler --port 8786
[ worker 10.1.0.115 ] : /home/aguirre/miniconda2/bin/python -m distributed.cli.dask_worker 10.1.0.115:8786 --host 10.1.0.115 --nthreads 0 --nprocs 1
[ worker 10.1.0.118 ] : /home/aguirre/miniconda2/bin/python -m distributed.cli.dask_worker 10.1.0.115:8786 --host 10.1.0.118 --nthreads 0 --nprocs 1
[ scheduler 10.1.0.115:8786 ] : distributed.scheduler - INFO - Scheduler at:           10.1.0.115:8786
[ scheduler 10.1.0.115:8786 ] : distributed.scheduler - INFO -      http at:           10.1.0.115:9786
[ scheduler 10.1.0.115:8786 ] : distributed.scheduler - WARNING - Could not start Bokeh web UI
[ scheduler 10.1.0.115:8786 ] : Traceback (most recent call last):
[ scheduler 10.1.0.115:8786 ] :   File "/home/aguirre/miniconda2/lib/python2.7/site-packages/distributed/cli/dask_scheduler.py", line 92, in main
[ scheduler 10.1.0.115:8786 ] :     bokeh_proc = subprocess.Popen(args)
[ scheduler 10.1.0.115:8786 ] :   File "/home/aguirre/miniconda2/lib/python2.7/subprocess.py", line 710, in __init__
[ scheduler 10.1.0.115:8786 ] :     errread, errwrite)
[ scheduler 10.1.0.115:8786 ] :   File "/home/aguirre/miniconda2/lib/python2.7/subprocess.py", line 1335, in _execute_child
[ scheduler 10.1.0.115:8786 ] :     raise child_exception
[ scheduler 10.1.0.115:8786 ] : OSError: [Errno 2] No such file or directory
[ worker 10.1.0.118 ] : bash: /home/aguirre/miniconda2/bin/python: No such file or directory
[ worker 10.1.0.118 ] : remote process exited with exit status 127

As you can see, the worker on 10.1.0.118 tries to call python on the wrong path (/home/aguirre/miniconda2/bin/python) which happens to be the path of the scheduler (10.1.0.115)

I took a look at the code and I think the problem lies on the line 189 of cluster.py. It builds the command to be launched by each worker with the path of the node where dask-ssh was launched. Just to check, I hard-coded the python path of 10.1.0.118 on line 189 of cluster.py, and it correctly launches the worker ! However, it now fails to launch a worker on 10.1.0.115, which is normal…

BTW, I don’t think the Exception raised by the scheduler (10.1.0.115) is related… it seems that it does not find bokeh in the PATH… However, when I launch the scheduler by itself, it does manage to launch bokeh web UI. But lets handle one problem at a time and focus on the Python PATH bit of my case.

I don’t have many clues on how this could be solved, but with some guidance, I’m willing to give a hand !

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:15 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
mrocklincommented, Jun 17, 2016

@felipeam86 I see two starting options:

  1. Add notes to the documentation dask/docs/source/...rst that dask-ssh is assuming similar environments, such as you might see on a system with a shared file system.
  2. Play with paramiko and learn how to create a connection that respects user environments. This probably involves some googling, some doc reading, and some experimentation on your own two-machine cluster setup. Then play with the implementation in dask/cluster.py to implement the changes that you needed in order to make things work well in experiments.
0reactions
caseyjlawcommented, Aug 8, 2016

@hussainsultan @felipeam86 ‘conda run’ was removed without much notice in 4.0.10: https://github.com/conda/conda/issues/2682

There is talk of bringing it back, but I don’t see it in conda 4.1.11.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Manage environments - Dask documentation
It is critical that each of your dask workers uses the same set of python packages ... may be ways to install specific...
Read more >
Could we add Python to system PATH by default? - Ideas
When installing python for Windows, I see a checkbox that gives an option of adding it to the 'PATH' which I presume is...
Read more >
dask_distributed_joblib.md - Grip
Tutorial: How to use dask-distributed to manage a pool of workers on multiple machines, and use them in joblib. In parallel computing, an...
Read more >
Unable to change Python path in reticulate - Stack Overflow
Please restart the R session if you need to attach reticulate to a different version of Python. Error in use_python("/usr/bin/python3", ...
Read more >
How to Add Python to PATH
In this tutorial, you'll learn about how to add Python, or any other program, to your PATH environment variable. You'll be covering the ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found