question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Connecting to Dask Gateway via NGINX reverse proxy - cluster.get_client() results in TimeoutError (because gateway:// TCP traffic blocked)

See original GitHub issue

Context

We have deployed Dask Gateway (0.9.0) via Helm, exposing the Traefik proxy via an externally-facing NGINX proxy. External traffic is SSL-encrypted (https), and behind the proxy all traffic is http. The prefix value is /dask-gateway, so access to Dask Gateway is via the URL https://[domain]/dask-gateway, where [domain] is the domain name configured on the proxy machine.

A section of the NGINX location (which may be relevant to the issue) is:

location /dask-gateway/ {
    proxy_pass http://<internal_ip>:80/dask-gateway/;
    proxy_redirect off;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection 'upgrade';
    proxy_set_header Host $host;
    proxy_cache_bypass $http_upgrade;
}

What happened:

After instantiating a new GatewayCluster, I cannot call the get_client() method on it; doing so yields a TimeoutError. The error reads:

OSError: Timed out trying to connect to gateway://<domain>:443/my-dask-gateway.42e(...)2eb after 10 s

As the prefix value for the Dask Gateway deployment is /dask-gateway it looks like this could be related to that endpoint not being added to the URL.

What you expected to happen:

To receive a handle to a Client object.

Minimal Complete Verifiable Example:

from dask_gateway import Gateway, GatewayCluster
cluster = GatewayCluster('https://[domain]/dask-gateway/', auth="jupyterhub")
gateway = Gateway('https://[domain]/dask-gateway/', auth="jupyterhub")
gateway.list_clusters()
# [ClusterReport<name=my-dask-gateway.42e(...)2eb, status=RUNNING>]
cluster.scale(2)
client = cluster.get_client()
# OSError: Timed out trying to connect to gateway://<domain>:443/my-dask-gateway.42e(...)2eb after 10 s

Anything else we need to know?:

I’ve seen elsewhere that this could be related to the versions of dask or distributed being out of sync but am unsure exactly what versions are running on the Dask Gateway deployment (I’ve just deployed the latest version of the Helm chart 0.9.0), or how to check.

The client environment is a Binder-generated JupyterHub environment built from https://github.com/dask/dask-examples, which by default does not include the dask-gateway Python package.

The Dask Gateway is configured as a service of the test JupyterHub deployment for the purposes of authentication.

Environment:

  • Dask version: 2.20.0
  • Python version: 3.8.12
  • Operating System: Ubuntu 18.04 (Jupyter base notebook container)
  • Install method (conda, pip, source): pip

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:20 (9 by maintainers)

github_iconTop GitHub Comments

2reactions
JColl88commented, Nov 10, 2022

It took me a while to get back to this. As @martindurant and @rigzba21 suggested, the problem I was seeing was a result of the gateway:// TCP traffic getting stuck at our NGINX reverse proxy. Here are some notes to hopefully help those with a similar setup (if there are any!) workaround the issue.

Optional Debugging to verify deployment:

(Feel free to skip to NGINX config section below.)

For debugging, it can be helpful to test by directing traffic from within the same K8s cluster, in much the same way as I mentioned above.

First, it was helpful to test from a DaskHub deployment’s JupyterHub service. By default this will point to the DaskHub’s own Dask Gateway instance, but by passing a URL to a separate Dask Gateway deployment (the one we’re debugging), we can test this with a client that is known to work. I noticed this requires the proxy_address variable to be explicitly set (otherwise the gateway:// traffic is still sent to the daskhub namespace’s Traefik proxy). E.g.

from dask_gateway import GatewayCluster

cluster = GatewayCluster(
    address='http://traefik-my-dask-gateway.my-dask-gateway:8083/dask-gateway/',
    proxy_address='http://traefik-my-dask-gateway.my-dask-gateway:8083/dask-gateway/',
    auth="jupyterhub"
)
cluster.get_client()

where service name is traefik-my-dask-gateway, namespace is my-dask-gateway and 8083 is the internal port the service is listening on.

Once this has been verified, it can be tested from a standalone JupyterHub instance, but I’d recommend starting with the same environment used by DaskHub (https://github.com/dask/helm-chart/blob/main/daskhub/values.yaml#L57). Version mismatch issues can seemingly also yield the OSError: Timed out trying to connect to gateway:// error, so testing from an environment known to work is helpful.

NGINX config for forwarding gateway:// traffic

The stream directive must be used to forward traffic. See https://docs.nginx.com/nginx/admin-guide/load-balancer/tcp-udp-load-balancer/ as linked above.

An example block would look like:

stream {
    server {
        listen 80;
        proxy_pass <dask-gateway_host_ip>:<traefik_nodeport>;
    }
}

where the <traefik_nodeport> is the high port number (typically 3XXXX). Note, there cannot be a protocol (http://) ahead of the <dask-gateway_host_ip>.

With this, the service can be queried from clients inside or outside the cluster e.g.:

from dask_gateway import Gateway, GatewayCluster
cluster = GatewayCluster('http://<proxy_floating_ip>:80/dask-gateway/', auth="jupyterhub")
gateway = Gateway('http://<proxy_floating_ip>:80/dask-gateway/', auth="jupyterhub")
gateway.list_clusters()

[ClusterReport<name=my-dask-gateway.5d2...a4b, status=RUNNING>]

client = cluster.get_client()
print(client)
<Client: 'tls://10.243.2.54:8786' processes=0 threads=0, memory=0 B>
1reaction
consideRatiocommented, Nov 11, 2022

Thank you soo much @JColl88 for reporting, investigating, and following this up so clearly!!! I’ve also learned from your experience now 😃

All the best!

Read more comments on GitHub >

github_iconTop Results From Across the Web

address / proxy_address in dask-gateway's Gateway constructor
Background. I'm confused on initialization of dask_gateway.Gateway . My desired outcomes. gateway = Gateway(address=..., proxy_address=.
Read more >
usage.rst.txt - Dask Gateway
For a completely local setup of both client and server (for demos, testing, etc. ... code-block:: python >>> cluster = gateway.new_cluster() >>> cluster...
Read more >
How to use dask jupyterlab extensions behind nginx proxy
Dask and the jupyterlab extensions work fine if I run them locally on a node and acces through 127.0.0.1 However I can't get...
Read more >
nginx connection timeout & client closed connection issue
Based on the log you provided from Nginx, it seems that the connections between your server and users are unstable or slow. Please...
Read more >
Can't connect to local cluster - times out - Dask Forum
New to Dask distributed. I've created a local cluster (default settings) and confirmed that it's running by checking the dashboard.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found