SSHCluster expects conda environment to be at the same path on all systems
See original GitHub issueWhat happened:
When using SSHCluster
on machines with different conda paths things fail.
What you expected to happen: The correct conda environment should be activated.
It seems that the SSHCluster
tries to call the python executable directly at the same path that the current Python is running at. It may be more robust to activate the conda environment with the same name as the current one and use the python
provided with that. It would also be good to be able to specify a conda environment in the kwargs.
Minimal Complete Verifiable Example:
# On Host A
conda create -n test -p /tmp/condaA python ipython dask
conda activate /tmp/condaA/test
# On Host B
conda create -n test -p /tmp/condaB python ipython dask
conda activate /tmp/condaB/test
# On Host A
from dask.distributed import SSHCluster
cluster = SSHCluster(["localhost", "HostB"]
...
distributed.deploy.ssh - INFO - env: ‘/tmp/condaA/test/bin/python’: No such file or directory
...
Full Example Traceback
This traceback is from a real example to the paths don’t quite match the simplified example above.
distributed.deploy.ssh - INFO - distributed.scheduler - INFO - -----------------------------------------------
distributed.deploy.ssh - INFO - distributed.http.proxy - INFO - To route to workers diagnostics web server please install jupyter-server-proxy: python -m pip install jupyter-server-proxy
distributed.deploy.ssh - INFO - distributed.scheduler - INFO - -----------------------------------------------
distributed.deploy.ssh - INFO - distributed.scheduler - INFO - Clear task state
distributed.deploy.ssh - INFO - distributed.scheduler - INFO - Scheduler at: tcp://10.51.100.15:8786
distributed.deploy.ssh - INFO - env: ‘/Users/jtomlinson/miniconda3/envs/coiledstream/bin/python’: No such file or directory
Task exception was never retrieved
future: <Task finished name='Task-45' coro=<_wrap_awaitable() done, defined at /Users/jtomlinson/miniconda3/envs/coiledstream/lib/python3.8/asyncio/tasks.py:677> exception=Exception('Worker failed to start')>
Traceback (most recent call last):
File "/Users/jtomlinson/miniconda3/envs/coiledstream/lib/python3.8/asyncio/tasks.py", line 684, in _wrap_awaitable
return (yield from awaitable.__await__())
File "/Users/jtomlinson/miniconda3/envs/coiledstream/lib/python3.8/site-packages/distributed/deploy/spec.py", line 50, in _
await self.start()
File "/Users/jtomlinson/miniconda3/envs/coiledstream/lib/python3.8/site-packages/distributed/deploy/ssh.py", line 129, in start
raise Exception("Worker failed to start")
Exception: Worker failed to start
distributed.deploy.ssh - INFO - env: ‘/Users/jtomlinson/miniconda3/envs/coiledstream/bin/python’: No such file or directory
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
<ipython-input-2-dbe5c7142de0> in <module>
----> 1 cluster = SSHCluster(["localhost", "10.51.0.32"], connect_options=[{}, {"username": "jacob"}])
~/miniconda3/envs/coiledstream/lib/python3.8/site-packages/distributed/deploy/ssh.py in SSHCluster(hosts, connect_options, worker_options, scheduler_options, worker_module, remote_python, **kwargs)
352 for i, host in enumerate(hosts[1:])
353 }
--> 354 return SpecCluster(workers, scheduler, name="SSHCluster", **kwargs)
~/miniconda3/envs/coiledstream/lib/python3.8/site-packages/distributed/deploy/spec.py in __init__(self, workers, scheduler, worker, asynchronous, loop, security, silence_logs, name)
255 self._loop_runner.start()
256 self.sync(self._start)
--> 257 self.sync(self._correct_state)
258
259 async def _start(self):
~/miniconda3/envs/coiledstream/lib/python3.8/site-packages/distributed/deploy/cluster.py in sync(self, func, asynchronous, callback_timeout, *args, **kwargs)
167 return future
168 else:
--> 169 return sync(self.loop, func, *args, **kwargs)
170
171 async def _get_logs(self, scheduler=True, workers=True):
~/miniconda3/envs/coiledstream/lib/python3.8/site-packages/distributed/utils.py in sync(loop, func, callback_timeout, *args, **kwargs)
337 if error[0]:
338 typ, exc, tb = error[0]
--> 339 raise exc.with_traceback(tb)
340 else:
341 return result[0]
~/miniconda3/envs/coiledstream/lib/python3.8/site-packages/distributed/utils.py in f()
321 if callback_timeout is not None:
322 future = asyncio.wait_for(future, callback_timeout)
--> 323 result[0] = yield future
324 except Exception as exc:
325 error[0] = sys.exc_info()
~/miniconda3/envs/coiledstream/lib/python3.8/site-packages/tornado/gen.py in run(self)
733
734 try:
--> 735 value = future.result()
736 except Exception:
737 exc_info = sys.exc_info()
~/miniconda3/envs/coiledstream/lib/python3.8/site-packages/distributed/deploy/spec.py in _correct_state_internal(self)
333 for w in workers:
334 w._cluster = weakref.ref(self)
--> 335 await w # for tornado gen.coroutine support
336 self.workers.update(dict(zip(to_open, workers)))
337
~/miniconda3/envs/coiledstream/lib/python3.8/site-packages/distributed/deploy/spec.py in _()
48 async with self.lock:
49 if self.status == "created":
---> 50 await self.start()
51 assert self.status == "running"
52 return self
~/miniconda3/envs/coiledstream/lib/python3.8/site-packages/distributed/deploy/ssh.py in start(self)
127 line = await self.proc.stderr.readline()
128 if not line.strip():
--> 129 raise Exception("Worker failed to start")
130 logger.info(line.strip())
131 if "worker at" in line:
Exception: Worker failed to start
Environment:
- Dask version: 2.23.0
- Python version: 3.8.5
- Operating System: macOS 10.14 and Ubuntu 10.04
- Install method (conda, pip, source): conda
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (5 by maintainers)
Top Results From Across the Web
Why does a conda environment prefer .local packages *before ...
py:211 : I expected to see the path listed here of the sklearn package installed on the conda environment that I'm using. I...
Read more >Managing environments - Conda
With conda, you can create, export, list, remove, and update environments that have different versions of Python and/or packages installed in them.
Read more >Install — ATLAS 2.0 documentation - Metagenome-Atlas
The BinGroup parameter is used during the genomic binning. In short: all samples in which you expect the same strain to be found...
Read more >The guide to Python virtual environments with conda - WhiteBox
Master Python virtual environments with conda, once and for all. Learn how to install conda from scratch, manage, and packaging virtual ...
Read more >Install Packages in Python - Earth Data Science
In order to make use of a conda environment, it must be activated by name. Conda doesn't expect you to remember every environment...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Yes please that would be great!
@jacobtomlinson I can work on a PR for this if it is fine with you. Thanks,