Connecting to Dask Gateway via NGINX reverse proxy - cluster.get_client() results in TimeoutError (because gateway:// TCP traffic blocked)
See original GitHub issueContext
We have deployed Dask Gateway (0.9.0) via Helm, exposing the Traefik proxy via an externally-facing NGINX proxy. External traffic is SSL-encrypted (https), and behind the proxy all traffic is http. The prefix
value is /dask-gateway
, so access to Dask Gateway is via the URL https://[domain]/dask-gateway
, where [domain]
is the domain name configured on the proxy machine.
A section of the NGINX location (which may be relevant to the issue) is:
location /dask-gateway/ {
proxy_pass http://<internal_ip>:80/dask-gateway/;
proxy_redirect off;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
proxy_cache_bypass $http_upgrade;
}
What happened:
After instantiating a new GatewayCluster, I cannot call the get_client()
method on it; doing so yields a TimeoutError
. The error reads:
OSError: Timed out trying to connect to gateway://<domain>:443/my-dask-gateway.42e(...)2eb after 10 s
As the prefix
value for the Dask Gateway deployment is /dask-gateway
it looks like this could be related to that endpoint not being added to the URL.
What you expected to happen:
To receive a handle to a Client object.
Minimal Complete Verifiable Example:
from dask_gateway import Gateway, GatewayCluster
cluster = GatewayCluster('https://[domain]/dask-gateway/', auth="jupyterhub")
gateway = Gateway('https://[domain]/dask-gateway/', auth="jupyterhub")
gateway.list_clusters()
# [ClusterReport<name=my-dask-gateway.42e(...)2eb, status=RUNNING>]
cluster.scale(2)
client = cluster.get_client()
# OSError: Timed out trying to connect to gateway://<domain>:443/my-dask-gateway.42e(...)2eb after 10 s
Anything else we need to know?:
I’ve seen elsewhere that this could be related to the versions of dask
or distributed
being out of sync but am unsure exactly what versions are running on the Dask Gateway deployment (I’ve just deployed the latest version of the Helm chart 0.9.0), or how to check.
The client environment is a Binder-generated JupyterHub environment built from https://github.com/dask/dask-examples, which by default does not include the dask-gateway
Python package.
The Dask Gateway is configured as a service of the test JupyterHub deployment for the purposes of authentication.
Environment:
- Dask version: 2.20.0
- Python version: 3.8.12
- Operating System: Ubuntu 18.04 (Jupyter base notebook container)
- Install method (conda, pip, source): pip
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:20 (9 by maintainers)
Top GitHub Comments
It took me a while to get back to this. As @martindurant and @rigzba21 suggested, the problem I was seeing was a result of the
gateway://
TCP traffic getting stuck at our NGINX reverse proxy. Here are some notes to hopefully help those with a similar setup (if there are any!) workaround the issue.Optional Debugging to verify deployment:
(Feel free to skip to NGINX config section below.)
For debugging, it can be helpful to test by directing traffic from within the same K8s cluster, in much the same way as I mentioned above.
First, it was helpful to test from a DaskHub deployment’s JupyterHub service. By default this will point to the DaskHub’s own Dask Gateway instance, but by passing a URL to a separate Dask Gateway deployment (the one we’re debugging), we can test this with a client that is known to work. I noticed this requires the
proxy_address
variable to be explicitly set (otherwise thegateway://
traffic is still sent to thedaskhub
namespace’s Traefik proxy). E.g.where service name is
traefik-my-dask-gateway
, namespace ismy-dask-gateway
and8083
is the internal port the service is listening on.Once this has been verified, it can be tested from a standalone JupyterHub instance, but I’d recommend starting with the same environment used by DaskHub (https://github.com/dask/helm-chart/blob/main/daskhub/values.yaml#L57). Version mismatch issues can seemingly also yield the
OSError: Timed out trying to connect to gateway://
error, so testing from an environment known to work is helpful.NGINX config for forwarding gateway:// traffic
The stream directive must be used to forward traffic. See https://docs.nginx.com/nginx/admin-guide/load-balancer/tcp-udp-load-balancer/ as linked above.
An example block would look like:
where the
<traefik_nodeport>
is the high port number (typically 3XXXX). Note, there cannot be a protocol (http://
) ahead of the<dask-gateway_host_ip>
.With this, the service can be queried from clients inside or outside the cluster e.g.:
Thank you soo much @JColl88 for reporting, investigating, and following this up so clearly!!! I’ve also learned from your experience now 😃
All the best!