question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] - Unable to create Dask clusters

See original GitHub issue

OS system and architecture in which you are running QHub

GCP

Expected behavior

Ability to create dask clusters via dask-gateway

Actual behavior

Unable to create Dask Clusters

How to Reproduce the problem?

Choose the filesystem/dask environment OR create a new environment with the qhub-dask package.

from dask_gateway import Gateway

☝🏽 has a depreciation warning, probably not an issue right now.

/home/conda/filesystem/418492d4df19a27c85d04dd5e3e624b767b129c055cd3ee87f0005b5bb10318e-20220310-172659-095264-2-dask/lib/python3.9/site-packages/dask_gateway/client.py:21: FutureWarning: format_bytes is deprecated and will be removed in a future release. Please use dask.utils.format_bytes instead.
  from distributed.utils import LoopRunner, format_bytes
gateway = Gateway()
options = gateway.cluster_options()

cluster_options fails with a

GatewayServerError: 500 Internal Server Error

Server got itself in trouble

And then

cluster = gateway.new_cluster()

fails with an error with a odd looking folder name

ValueError: [Errno 2] No such file or directory: '/home/conda/admin/envs/test-admin/conda-meta'

Command output

>>> options = gateway.cluster_options()

---------------------------------------------------------------------------
GatewayServerError                        Traceback (most recent call last)
Input In [4], in <cell line: 4>()
      1 from dask_gateway import Gateway
      2 gateway = Gateway()
----> 4 options = gateway.cluster_options()
      5 options

File /home/conda/dharhas@quansight.com/ef17a653925332c287418a5f465b971dedcea0bb1902e3abe411ee5c2372e0d8-20220311-013959-007907-9-dask-geo2/lib/python3.9/site-packages/dask_gateway/client.py:518, in Gateway.cluster_options(self, use_local_defaults, **kwargs)
    504 def cluster_options(self, use_local_defaults=True, **kwargs):
    505     """Get the available cluster configuration options.
    506 
    507     Parameters
   (...)
    516         A dict of cluster options.
    517     """
--> 518     return self.sync(
    519         self._cluster_options, use_local_defaults=use_local_defaults, **kwargs
    520     )

File /home/conda/dharhas@quansight.com/ef17a653925332c287418a5f465b971dedcea0bb1902e3abe411ee5c2372e0d8-20220311-013959-007907-9-dask-geo2/lib/python3.9/site-packages/dask_gateway/client.py:343, in Gateway.sync(self, func, *args, **kwargs)
    339 future = asyncio.run_coroutine_threadsafe(
    340     func(*args, **kwargs), self.loop.asyncio_loop
    341 )
    342 try:
--> 343     return future.result()
    344 except BaseException:
    345     future.cancel()

File /home/conda/dharhas@quansight.com/ef17a653925332c287418a5f465b971dedcea0bb1902e3abe411ee5c2372e0d8-20220311-013959-007907-9-dask-geo2/lib/python3.9/concurrent/futures/_base.py:446, in Future.result(self, timeout)
    444     raise CancelledError()
    445 elif self._state == FINISHED:
--> 446     return self.__get_result()
    447 else:
    448     raise TimeoutError()

File /home/conda/dharhas@quansight.com/ef17a653925332c287418a5f465b971dedcea0bb1902e3abe411ee5c2372e0d8-20220311-013959-007907-9-dask-geo2/lib/python3.9/concurrent/futures/_base.py:391, in Future.__get_result(self)
    389 if self._exception:
    390     try:
--> 391         raise self._exception
    392     finally:
    393         # Break a reference cycle with the exception in self._exception
    394         self = None

File /home/conda/dharhas@quansight.com/ef17a653925332c287418a5f465b971dedcea0bb1902e3abe411ee5c2372e0d8-20220311-013959-007907-9-dask-geo2/lib/python3.9/site-packages/dask_gateway/client.py:497, in Gateway._cluster_options(self, use_local_defaults)
    495 async def _cluster_options(self, use_local_defaults=True):
    496     url = "%s/api/v1/options" % self.address
--> 497     resp = await self._request("GET", url)
    498     data = await resp.json()
    499     options = Options._from_spec(data["cluster_options"])

File /home/conda/dharhas@quansight.com/ef17a653925332c287418a5f465b971dedcea0bb1902e3abe411ee5c2372e0d8-20220311-013959-007907-9-dask-geo2/lib/python3.9/site-packages/dask_gateway/client.py:417, in Gateway._request(self, method, url, json)
    415     raise GatewayClusterError(msg)
    416 elif resp.status == 500:
--> 417     raise GatewayServerError(msg)
    418 else:
    419     resp.raise_for_status()

GatewayServerError: 500 Internal Server Error

Server got itself in trouble


>>> cluster = gateway.new_cluster()

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [3], in <cell line: 1>()
----> 1 cluster = gateway.new_cluster()

File /home/conda/filesystem/418492d4df19a27c85d04dd5e3e624b767b129c055cd3ee87f0005b5bb10318e-20220310-172659-095264-2-dask/lib/python3.9/site-packages/dask_gateway/client.py:639, in Gateway.new_cluster(self, cluster_options, shutdown_on_close, **kwargs)
    616 def new_cluster(self, cluster_options=None, shutdown_on_close=True, **kwargs):
    617     """Submit a new cluster to the gateway, and wait for it to be started.
    618 
    619     Same as calling ``submit`` and ``connect`` in one go.
   (...)
    637     cluster : GatewayCluster
    638     """
--> 639     return GatewayCluster(
    640         address=self.address,
    641         proxy_address=self.proxy_address,
    642         public_address=self._public_address,
    643         auth=self.auth,
    644         asynchronous=self.asynchronous,
    645         loop=self.loop,
    646         cluster_options=cluster_options,
    647         shutdown_on_close=shutdown_on_close,
    648         **kwargs,
    649     )

File /home/conda/filesystem/418492d4df19a27c85d04dd5e3e624b767b129c055cd3ee87f0005b5bb10318e-20220310-172659-095264-2-dask/lib/python3.9/site-packages/dask_gateway/client.py:814, in GatewayCluster.__init__(self, address, proxy_address, public_address, auth, cluster_options, shutdown_on_close, asynchronous, loop, **kwargs)
    802 def __init__(
    803     self,
    804     address=None,
   (...)
    812     **kwargs,
    813 ):
--> 814     self._init_internal(
    815         address=address,
    816         proxy_address=proxy_address,
    817         public_address=public_address,
    818         auth=auth,
    819         cluster_options=cluster_options,
    820         cluster_kwargs=kwargs,
    821         shutdown_on_close=shutdown_on_close,
    822         asynchronous=asynchronous,
    823         loop=loop,
    824     )

File /home/conda/filesystem/418492d4df19a27c85d04dd5e3e624b767b129c055cd3ee87f0005b5bb10318e-20220310-172659-095264-2-dask/lib/python3.9/site-packages/dask_gateway/client.py:919, in GatewayCluster._init_internal(self, address, proxy_address, public_address, auth, cluster_options, cluster_kwargs, shutdown_on_close, asynchronous, loop, name)
    917     self.status = "starting"
    918 if not self.asynchronous:
--> 919     self.gateway.sync(self._start_internal)

File /home/conda/filesystem/418492d4df19a27c85d04dd5e3e624b767b129c055cd3ee87f0005b5bb10318e-20220310-172659-095264-2-dask/lib/python3.9/site-packages/dask_gateway/client.py:343, in Gateway.sync(self, func, *args, **kwargs)
    339 future = asyncio.run_coroutine_threadsafe(
    340     func(*args, **kwargs), self.loop.asyncio_loop
    341 )
    342 try:
--> 343     return future.result()
    344 except BaseException:
    345     future.cancel()

File /home/conda/filesystem/418492d4df19a27c85d04dd5e3e624b767b129c055cd3ee87f0005b5bb10318e-20220310-172659-095264-2-dask/lib/python3.9/concurrent/futures/_base.py:446, in Future.result(self, timeout)
    444     raise CancelledError()
    445 elif self._state == FINISHED:
--> 446     return self.__get_result()
    447 else:
    448     raise TimeoutError()

File /home/conda/filesystem/418492d4df19a27c85d04dd5e3e624b767b129c055cd3ee87f0005b5bb10318e-20220310-172659-095264-2-dask/lib/python3.9/concurrent/futures/_base.py:391, in Future.__get_result(self)
    389 if self._exception:
    390     try:
--> 391         raise self._exception
    392     finally:
    393         # Break a reference cycle with the exception in self._exception
    394         self = None

File /home/conda/filesystem/418492d4df19a27c85d04dd5e3e624b767b129c055cd3ee87f0005b5bb10318e-20220310-172659-095264-2-dask/lib/python3.9/site-packages/dask_gateway/client.py:933, in GatewayCluster._start_internal(self)
    931     self._start_task = asyncio.ensure_future(self._start_async())
    932 try:
--> 933     await self._start_task
    934 except BaseException:
    935     # On exception, cleanup
    936     await self._stop_internal()

File /home/conda/filesystem/418492d4df19a27c85d04dd5e3e624b767b129c055cd3ee87f0005b5bb10318e-20220310-172659-095264-2-dask/lib/python3.9/site-packages/dask_gateway/client.py:946, in GatewayCluster._start_async(self)
    944 if self.status == "created":
    945     self.status = "starting"
--> 946     self.name = await self.gateway._submit(
    947         cluster_options=self._cluster_options, **self._cluster_kwargs
    948     )
    949 # Connect to cluster
    950 try:

File /home/conda/filesystem/418492d4df19a27c85d04dd5e3e624b767b129c055cd3ee87f0005b5bb10318e-20220310-172659-095264-2-dask/lib/python3.9/site-packages/dask_gateway/client.py:535, in Gateway._submit(self, cluster_options, **kwargs)
    533     options = self._config_cluster_options()
    534     options.update(kwargs)
--> 535 resp = await self._request("POST", url, json={"cluster_options": options})
    536 data = await resp.json()
    537 return data["name"]

File /home/conda/filesystem/418492d4df19a27c85d04dd5e3e624b767b129c055cd3ee87f0005b5bb10318e-20220310-172659-095264-2-dask/lib/python3.9/site-packages/dask_gateway/client.py:413, in Gateway._request(self, method, url, json)
    410     msg = await resp.text()
    412 if resp.status in {404, 422}:
--> 413     raise ValueError(msg)
    414 elif resp.status == 409:
    415     raise GatewayClusterError(msg)

ValueError: [Errno 2] No such file or directory: '/home/conda/admin/envs/test-admin/conda-meta'

Versions and dependencies used.

conda 4.11.0 QHub 0.4 RC

Compute environment

GCP

Integrations

Dask

Anything else?

No response

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:13 (13 by maintainers)

github_iconTop GitHub Comments

2reactions
viniciusdccommented, Mar 14, 2022

Hi @costrouc @iameskild based on the above discussion, this seems to be issue:

  • Removing a conda-store environment leads to broken symlinks after environment deletion. As we can see from the error messages and the configuration of get_packages the function will attempt to look for /home/conda/group/envs/env_name based on the list of available symlinks on `/group/envs. Right now, those symlinks are not deleted after the environment is removed from conda-store, then the function will attempt to open that path that does not exist anymore. image

The solutions are:

  • Remove deleted environments from get_packages search as pointed out by Eskild above #1162
  • Or remove the broken symlinks during env removal.
1reaction
iameskildcommented, Mar 11, 2022

As another sanity check, I deployed qhub on GCP using v0.4.0rc2 and this isn’t an issue. This means that this bug must have been introduced sometime in the last 8 days, since this commit: https://github.com/Quansight/qhub/commit/62196c98e86234264f0b26a43f46cb5df0c0fafe

Read more comments on GitHub >

github_iconTop Results From Across the Web

[BUG] - Unable to create Dask clusters · Issue #1159 - GitHub
Choose the filesystem/dask environment OR create a new environment with the qhub-dask package. from dask_gateway import Gateway.
Read more >
Deploy Dask Clusters - Dask documentation
This page describes various ways to set up Dask clusters on different hardware, either locally on your own machine or on a distributed...
Read more >
Can not create cluster with Dask Gateway over the Slurm HPC ...
I try to create cluster with dask gateway over the Slurm HPC cluster. I follow dask-gateway docs and when I try to create...
Read more >
python - Unable to expand cluster by dask - Stack Overflow
I am very new to kubernetes & dask and trying to implement some kube cluster and have created minikube cluster with some services, ......
Read more >
Issues getting started with Xarray and Dask on Pangeo
I think the cancellation error is ultimately a memory problem ... I'm able to start a Dask cluster (though it takes awhile -...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found