question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

SSHCluster expects conda environment to be at the same path on all systems

See original GitHub issue

What happened: When using SSHCluster on machines with different conda paths things fail.

What you expected to happen: The correct conda environment should be activated.

It seems that the SSHCluster tries to call the python executable directly at the same path that the current Python is running at. It may be more robust to activate the conda environment with the same name as the current one and use the python provided with that. It would also be good to be able to specify a conda environment in the kwargs.

Minimal Complete Verifiable Example:

# On Host A
conda create -n test -p /tmp/condaA python ipython dask
conda activate /tmp/condaA/test

# On Host B
conda create -n test -p /tmp/condaB python ipython dask
conda activate /tmp/condaB/test
# On Host A

from dask.distributed import SSHCluster
cluster = SSHCluster(["localhost", "HostB"]
...
distributed.deploy.ssh - INFO - env: ‘/tmp/condaA/test/bin/python’: No such file or directory
...
Full Example Traceback

This traceback is from a real example to the paths don’t quite match the simplified example above.

distributed.deploy.ssh - INFO - distributed.scheduler - INFO - -----------------------------------------------
distributed.deploy.ssh - INFO - distributed.http.proxy - INFO - To route to workers diagnostics web server please install jupyter-server-proxy: python -m pip install jupyter-server-proxy
distributed.deploy.ssh - INFO - distributed.scheduler - INFO - -----------------------------------------------
distributed.deploy.ssh - INFO - distributed.scheduler - INFO - Clear task state
distributed.deploy.ssh - INFO - distributed.scheduler - INFO -   Scheduler at:   tcp://10.51.100.15:8786
distributed.deploy.ssh - INFO - env: ‘/Users/jtomlinson/miniconda3/envs/coiledstream/bin/python’: No such file or directory
Task exception was never retrieved
future: <Task finished name='Task-45' coro=<_wrap_awaitable() done, defined at /Users/jtomlinson/miniconda3/envs/coiledstream/lib/python3.8/asyncio/tasks.py:677> exception=Exception('Worker failed to start')>
Traceback (most recent call last):
  File "/Users/jtomlinson/miniconda3/envs/coiledstream/lib/python3.8/asyncio/tasks.py", line 684, in _wrap_awaitable
    return (yield from awaitable.__await__())
  File "/Users/jtomlinson/miniconda3/envs/coiledstream/lib/python3.8/site-packages/distributed/deploy/spec.py", line 50, in _
    await self.start()
  File "/Users/jtomlinson/miniconda3/envs/coiledstream/lib/python3.8/site-packages/distributed/deploy/ssh.py", line 129, in start
    raise Exception("Worker failed to start")
Exception: Worker failed to start
distributed.deploy.ssh - INFO - env: ‘/Users/jtomlinson/miniconda3/envs/coiledstream/bin/python’: No such file or directory
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-2-dbe5c7142de0> in <module>
----> 1 cluster = SSHCluster(["localhost", "10.51.0.32"], connect_options=[{}, {"username": "jacob"}])

~/miniconda3/envs/coiledstream/lib/python3.8/site-packages/distributed/deploy/ssh.py in SSHCluster(hosts, connect_options, worker_options, scheduler_options, worker_module, remote_python, **kwargs)
    352         for i, host in enumerate(hosts[1:])
    353     }
--> 354     return SpecCluster(workers, scheduler, name="SSHCluster", **kwargs)

~/miniconda3/envs/coiledstream/lib/python3.8/site-packages/distributed/deploy/spec.py in __init__(self, workers, scheduler, worker, asynchronous, loop, security, silence_logs, name)
    255             self._loop_runner.start()
    256             self.sync(self._start)
--> 257             self.sync(self._correct_state)
    258
    259     async def _start(self):

~/miniconda3/envs/coiledstream/lib/python3.8/site-packages/distributed/deploy/cluster.py in sync(self, func, asynchronous, callback_timeout, *args, **kwargs)
    167             return future
    168         else:
--> 169             return sync(self.loop, func, *args, **kwargs)
    170
    171     async def _get_logs(self, scheduler=True, workers=True):

~/miniconda3/envs/coiledstream/lib/python3.8/site-packages/distributed/utils.py in sync(loop, func, callback_timeout, *args, **kwargs)
    337     if error[0]:
    338         typ, exc, tb = error[0]
--> 339         raise exc.with_traceback(tb)
    340     else:
    341         return result[0]

~/miniconda3/envs/coiledstream/lib/python3.8/site-packages/distributed/utils.py in f()
    321             if callback_timeout is not None:
    322                 future = asyncio.wait_for(future, callback_timeout)
--> 323             result[0] = yield future
    324         except Exception as exc:
    325             error[0] = sys.exc_info()

~/miniconda3/envs/coiledstream/lib/python3.8/site-packages/tornado/gen.py in run(self)
    733
    734                     try:
--> 735                         value = future.result()
    736                     except Exception:
    737                         exc_info = sys.exc_info()

~/miniconda3/envs/coiledstream/lib/python3.8/site-packages/distributed/deploy/spec.py in _correct_state_internal(self)
    333                 for w in workers:
    334                     w._cluster = weakref.ref(self)
--> 335                     await w  # for tornado gen.coroutine support
    336             self.workers.update(dict(zip(to_open, workers)))
    337

~/miniconda3/envs/coiledstream/lib/python3.8/site-packages/distributed/deploy/spec.py in _()
     48             async with self.lock:
     49                 if self.status == "created":
---> 50                     await self.start()
     51                     assert self.status == "running"
     52             return self

~/miniconda3/envs/coiledstream/lib/python3.8/site-packages/distributed/deploy/ssh.py in start(self)
    127             line = await self.proc.stderr.readline()
    128             if not line.strip():
--> 129                 raise Exception("Worker failed to start")
    130             logger.info(line.strip())
    131             if "worker at" in line:

Exception: Worker failed to start

Environment:

  • Dask version: 2.23.0
  • Python version: 3.8.5
  • Operating System: macOS 10.14 and Ubuntu 10.04
  • Install method (conda, pip, source): conda

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
jacobtomlinsoncommented, Aug 26, 2020

Yes please that would be great!

0reactions
abduhbmcommented, Aug 25, 2020

@jacobtomlinson I can work on a PR for this if it is fine with you. Thanks,

Read more comments on GitHub >

github_iconTop Results From Across the Web

Why does a conda environment prefer .local packages *before ...
py:211 : I expected to see the path listed here of the sklearn package installed on the conda environment that I'm using. I...
Read more >
Managing environments - Conda
With conda, you can create, export, list, remove, and update environments that have different versions of Python and/or packages installed in them.
Read more >
Install — ATLAS 2.0 documentation - Metagenome-Atlas
The BinGroup parameter is used during the genomic binning. In short: all samples in which you expect the same strain to be found...
Read more >
The guide to Python virtual environments with conda - WhiteBox
Master Python virtual environments with conda, once and for all. Learn how to install conda from scratch, manage, and packaging virtual ...
Read more >
Install Packages in Python - Earth Data Science
In order to make use of a conda environment, it must be activated by name. Conda doesn't expect you to remember every environment...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found