[BUG] - Unable to create Dask clusters
See original GitHub issueOS system and architecture in which you are running QHub
GCP
Expected behavior
Ability to create dask clusters via dask-gateway
Actual behavior
Unable to create Dask Clusters
How to Reproduce the problem?
Choose the filesystem/dask
environment OR create a new environment with the qhub-dask
package.
from dask_gateway import Gateway
☝🏽 has a depreciation warning, probably not an issue right now.
/home/conda/filesystem/418492d4df19a27c85d04dd5e3e624b767b129c055cd3ee87f0005b5bb10318e-20220310-172659-095264-2-dask/lib/python3.9/site-packages/dask_gateway/client.py:21: FutureWarning: format_bytes is deprecated and will be removed in a future release. Please use dask.utils.format_bytes instead.
from distributed.utils import LoopRunner, format_bytes
gateway = Gateway()
options = gateway.cluster_options()
cluster_options fails with a
GatewayServerError: 500 Internal Server Error
Server got itself in trouble
And then
cluster = gateway.new_cluster()
fails with an error with a odd looking folder name
ValueError: [Errno 2] No such file or directory: '/home/conda/admin/envs/test-admin/conda-meta'
Command output
>>> options = gateway.cluster_options()
---------------------------------------------------------------------------
GatewayServerError Traceback (most recent call last)
Input In [4], in <cell line: 4>()
1 from dask_gateway import Gateway
2 gateway = Gateway()
----> 4 options = gateway.cluster_options()
5 options
File /home/conda/dharhas@quansight.com/ef17a653925332c287418a5f465b971dedcea0bb1902e3abe411ee5c2372e0d8-20220311-013959-007907-9-dask-geo2/lib/python3.9/site-packages/dask_gateway/client.py:518, in Gateway.cluster_options(self, use_local_defaults, **kwargs)
504 def cluster_options(self, use_local_defaults=True, **kwargs):
505 """Get the available cluster configuration options.
506
507 Parameters
(...)
516 A dict of cluster options.
517 """
--> 518 return self.sync(
519 self._cluster_options, use_local_defaults=use_local_defaults, **kwargs
520 )
File /home/conda/dharhas@quansight.com/ef17a653925332c287418a5f465b971dedcea0bb1902e3abe411ee5c2372e0d8-20220311-013959-007907-9-dask-geo2/lib/python3.9/site-packages/dask_gateway/client.py:343, in Gateway.sync(self, func, *args, **kwargs)
339 future = asyncio.run_coroutine_threadsafe(
340 func(*args, **kwargs), self.loop.asyncio_loop
341 )
342 try:
--> 343 return future.result()
344 except BaseException:
345 future.cancel()
File /home/conda/dharhas@quansight.com/ef17a653925332c287418a5f465b971dedcea0bb1902e3abe411ee5c2372e0d8-20220311-013959-007907-9-dask-geo2/lib/python3.9/concurrent/futures/_base.py:446, in Future.result(self, timeout)
444 raise CancelledError()
445 elif self._state == FINISHED:
--> 446 return self.__get_result()
447 else:
448 raise TimeoutError()
File /home/conda/dharhas@quansight.com/ef17a653925332c287418a5f465b971dedcea0bb1902e3abe411ee5c2372e0d8-20220311-013959-007907-9-dask-geo2/lib/python3.9/concurrent/futures/_base.py:391, in Future.__get_result(self)
389 if self._exception:
390 try:
--> 391 raise self._exception
392 finally:
393 # Break a reference cycle with the exception in self._exception
394 self = None
File /home/conda/dharhas@quansight.com/ef17a653925332c287418a5f465b971dedcea0bb1902e3abe411ee5c2372e0d8-20220311-013959-007907-9-dask-geo2/lib/python3.9/site-packages/dask_gateway/client.py:497, in Gateway._cluster_options(self, use_local_defaults)
495 async def _cluster_options(self, use_local_defaults=True):
496 url = "%s/api/v1/options" % self.address
--> 497 resp = await self._request("GET", url)
498 data = await resp.json()
499 options = Options._from_spec(data["cluster_options"])
File /home/conda/dharhas@quansight.com/ef17a653925332c287418a5f465b971dedcea0bb1902e3abe411ee5c2372e0d8-20220311-013959-007907-9-dask-geo2/lib/python3.9/site-packages/dask_gateway/client.py:417, in Gateway._request(self, method, url, json)
415 raise GatewayClusterError(msg)
416 elif resp.status == 500:
--> 417 raise GatewayServerError(msg)
418 else:
419 resp.raise_for_status()
GatewayServerError: 500 Internal Server Error
Server got itself in trouble
>>> cluster = gateway.new_cluster()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [3], in <cell line: 1>()
----> 1 cluster = gateway.new_cluster()
File /home/conda/filesystem/418492d4df19a27c85d04dd5e3e624b767b129c055cd3ee87f0005b5bb10318e-20220310-172659-095264-2-dask/lib/python3.9/site-packages/dask_gateway/client.py:639, in Gateway.new_cluster(self, cluster_options, shutdown_on_close, **kwargs)
616 def new_cluster(self, cluster_options=None, shutdown_on_close=True, **kwargs):
617 """Submit a new cluster to the gateway, and wait for it to be started.
618
619 Same as calling ``submit`` and ``connect`` in one go.
(...)
637 cluster : GatewayCluster
638 """
--> 639 return GatewayCluster(
640 address=self.address,
641 proxy_address=self.proxy_address,
642 public_address=self._public_address,
643 auth=self.auth,
644 asynchronous=self.asynchronous,
645 loop=self.loop,
646 cluster_options=cluster_options,
647 shutdown_on_close=shutdown_on_close,
648 **kwargs,
649 )
File /home/conda/filesystem/418492d4df19a27c85d04dd5e3e624b767b129c055cd3ee87f0005b5bb10318e-20220310-172659-095264-2-dask/lib/python3.9/site-packages/dask_gateway/client.py:814, in GatewayCluster.__init__(self, address, proxy_address, public_address, auth, cluster_options, shutdown_on_close, asynchronous, loop, **kwargs)
802 def __init__(
803 self,
804 address=None,
(...)
812 **kwargs,
813 ):
--> 814 self._init_internal(
815 address=address,
816 proxy_address=proxy_address,
817 public_address=public_address,
818 auth=auth,
819 cluster_options=cluster_options,
820 cluster_kwargs=kwargs,
821 shutdown_on_close=shutdown_on_close,
822 asynchronous=asynchronous,
823 loop=loop,
824 )
File /home/conda/filesystem/418492d4df19a27c85d04dd5e3e624b767b129c055cd3ee87f0005b5bb10318e-20220310-172659-095264-2-dask/lib/python3.9/site-packages/dask_gateway/client.py:919, in GatewayCluster._init_internal(self, address, proxy_address, public_address, auth, cluster_options, cluster_kwargs, shutdown_on_close, asynchronous, loop, name)
917 self.status = "starting"
918 if not self.asynchronous:
--> 919 self.gateway.sync(self._start_internal)
File /home/conda/filesystem/418492d4df19a27c85d04dd5e3e624b767b129c055cd3ee87f0005b5bb10318e-20220310-172659-095264-2-dask/lib/python3.9/site-packages/dask_gateway/client.py:343, in Gateway.sync(self, func, *args, **kwargs)
339 future = asyncio.run_coroutine_threadsafe(
340 func(*args, **kwargs), self.loop.asyncio_loop
341 )
342 try:
--> 343 return future.result()
344 except BaseException:
345 future.cancel()
File /home/conda/filesystem/418492d4df19a27c85d04dd5e3e624b767b129c055cd3ee87f0005b5bb10318e-20220310-172659-095264-2-dask/lib/python3.9/concurrent/futures/_base.py:446, in Future.result(self, timeout)
444 raise CancelledError()
445 elif self._state == FINISHED:
--> 446 return self.__get_result()
447 else:
448 raise TimeoutError()
File /home/conda/filesystem/418492d4df19a27c85d04dd5e3e624b767b129c055cd3ee87f0005b5bb10318e-20220310-172659-095264-2-dask/lib/python3.9/concurrent/futures/_base.py:391, in Future.__get_result(self)
389 if self._exception:
390 try:
--> 391 raise self._exception
392 finally:
393 # Break a reference cycle with the exception in self._exception
394 self = None
File /home/conda/filesystem/418492d4df19a27c85d04dd5e3e624b767b129c055cd3ee87f0005b5bb10318e-20220310-172659-095264-2-dask/lib/python3.9/site-packages/dask_gateway/client.py:933, in GatewayCluster._start_internal(self)
931 self._start_task = asyncio.ensure_future(self._start_async())
932 try:
--> 933 await self._start_task
934 except BaseException:
935 # On exception, cleanup
936 await self._stop_internal()
File /home/conda/filesystem/418492d4df19a27c85d04dd5e3e624b767b129c055cd3ee87f0005b5bb10318e-20220310-172659-095264-2-dask/lib/python3.9/site-packages/dask_gateway/client.py:946, in GatewayCluster._start_async(self)
944 if self.status == "created":
945 self.status = "starting"
--> 946 self.name = await self.gateway._submit(
947 cluster_options=self._cluster_options, **self._cluster_kwargs
948 )
949 # Connect to cluster
950 try:
File /home/conda/filesystem/418492d4df19a27c85d04dd5e3e624b767b129c055cd3ee87f0005b5bb10318e-20220310-172659-095264-2-dask/lib/python3.9/site-packages/dask_gateway/client.py:535, in Gateway._submit(self, cluster_options, **kwargs)
533 options = self._config_cluster_options()
534 options.update(kwargs)
--> 535 resp = await self._request("POST", url, json={"cluster_options": options})
536 data = await resp.json()
537 return data["name"]
File /home/conda/filesystem/418492d4df19a27c85d04dd5e3e624b767b129c055cd3ee87f0005b5bb10318e-20220310-172659-095264-2-dask/lib/python3.9/site-packages/dask_gateway/client.py:413, in Gateway._request(self, method, url, json)
410 msg = await resp.text()
412 if resp.status in {404, 422}:
--> 413 raise ValueError(msg)
414 elif resp.status == 409:
415 raise GatewayClusterError(msg)
ValueError: [Errno 2] No such file or directory: '/home/conda/admin/envs/test-admin/conda-meta'
Versions and dependencies used.
conda 4.11.0 QHub 0.4 RC
Compute environment
GCP
Integrations
Dask
Anything else?
No response
Issue Analytics
- State:
- Created 2 years ago
- Comments:13 (13 by maintainers)
Top Results From Across the Web
[BUG] - Unable to create Dask clusters · Issue #1159 - GitHub
Choose the filesystem/dask environment OR create a new environment with the qhub-dask package. from dask_gateway import Gateway.
Read more >Deploy Dask Clusters - Dask documentation
This page describes various ways to set up Dask clusters on different hardware, either locally on your own machine or on a distributed...
Read more >Can not create cluster with Dask Gateway over the Slurm HPC ...
I try to create cluster with dask gateway over the Slurm HPC cluster. I follow dask-gateway docs and when I try to create...
Read more >python - Unable to expand cluster by dask - Stack Overflow
I am very new to kubernetes & dask and trying to implement some kube cluster and have created minikube cluster with some services, ......
Read more >Issues getting started with Xarray and Dask on Pangeo
I think the cancellation error is ultimately a memory problem ... I'm able to start a Dask cluster (though it takes awhile -...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Hi @costrouc @iameskild based on the above discussion, this seems to be issue:
get_packages
the function will attempt to look for/home/conda/group/envs/env_name
based on the list of available symlinks on `/group/envs. Right now, those symlinks are not deleted after the environment is removed from conda-store, then the function will attempt to open that path that does not exist anymore.The solutions are:
get_packages
search as pointed out by Eskild above #1162As another sanity check, I deployed qhub on GCP using
v0.4.0rc2
and this isn’t an issue. This means that this bug must have been introduced sometime in the last 8 days, since this commit: https://github.com/Quansight/qhub/commit/62196c98e86234264f0b26a43f46cb5df0c0fafe