question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

address / proxy_address in dask-gateway's Gateway constructor

See original GitHub issue

Background

I’m confused on initialization of dask_gateway.Gateway.

My desired outcomes

  1. gateway = Gateway(address=..., proxy_address=...) without errors
  2. cluster = gateway.create_cluster() without errors
  3. client = cluster.get_client() without errors
  4. client links are usable from a browser

My open questions

  • Will https cause issues?
  • Will a path prefix cause issues?

image

My setup

  • dask-gateway is configured with a prefix /services/dask-gateway, which means traefik will trip it.
  • jupyterhub is configured to proxy /services/dask-gateway to dask-gateway’s traefik
  • nginx-ingress is configured to terminate TLS for the jupyterhub proxy

Addresses of relevance I’ve tested that all these work with the /api/version endpoint.

  • https://jupyter.example.com/services/dask-gateway
    • This is accessible by my users’ browsers, while the others below will only be accessible from the server they control through jupyterhub.
  • http://proxy-public.jupyterhub/services/dask-gateway
    • This relies on the <k8s servicename>.<k8s namespace> DNS name, and that JupyterHub is configured to pass forward traffic to dask-gateway’s traefik.
  • http://traefik-dask-gateway.dask-gateway/services/dask-gateway
  • http://api-dask-gateway.dask-gateway:8000
    • Note that the prefix is excluded here, this is because the dask-gateway-server doesn’t expect it. It is only expected by treafik which will strip it and pass it onwards.

gateway.get_client() outcomes

Attempt 1

gateway = Gateway(
    address="http://api-dask-gateway.dask-gateway:8000",
    proxy_address="https://jupyter.example.com/services/dask-gateway",
)
cluster = gateway.new_cluster(options)
client = cluster.get_client()

Note how this fail by a connection attempt to gateway://jupyter.example.com/dask-gateway.e809a46ff48742e1add7802ff563b0d2 which seems like a failure because the prefix isn’t passed considered correctly.

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
/opt/conda/lib/python3.6/site-packages/distributed/comm/core.py in connect(addr, timeout, deserialize, connection_args)
    231             if not comm:
--> 232                 _raise(error)
    233         except FatalCommClosedError:

/opt/conda/lib/python3.6/site-packages/distributed/comm/core.py in _raise(error)
    212         )
--> 213         raise IOError(msg)
    214 

OSError: Timed out trying to connect to 'gateway://jupyter.example.com/dask-gateway.e809a46ff48742e1add7802ff563b0d2' after 10 s: in <dask_gateway.comm.GatewayConnector object at 0x7f4827c9bb70>: ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

OSError                                   Traceback (most recent call last)
<ipython-input-24-07e27b102cb5> in <module>
----> 1 client = cluster.get_client()
      2 client

/opt/conda/lib/python3.6/site-packages/dask_gateway/client.py in get_client(self, set_as_default)
   1026             set_as_default=set_as_default,
   1027             asynchronous=self.asynchronous,
-> 1028             loop=self.loop,
   1029         )
   1030         if not self.asynchronous:

/opt/conda/lib/python3.6/site-packages/distributed/client.py in __init__(self, address, loop, timeout, set_as_default, scheduler_file, security, asynchronous, name, heartbeat_interval, serializers, deserializers, extensions, direct_to_workers, **kwargs)
    719             ext(self)
    720 
--> 721         self.start(timeout=timeout)
    722         Client._instances.add(self)
    723 

/opt/conda/lib/python3.6/site-packages/distributed/client.py in start(self, **kwargs)
    892             self._started = asyncio.ensure_future(self._start(**kwargs))
    893         else:
--> 894             sync(self.loop, self._start, **kwargs)
    895 
    896     def __await__(self):

/opt/conda/lib/python3.6/site-packages/distributed/utils.py in sync(loop, func, callback_timeout, *args, **kwargs)
    346     if error[0]:
    347         typ, exc, tb = error[0]
--> 348         raise exc.with_traceback(tb)
    349     else:
    350         return result[0]

/opt/conda/lib/python3.6/site-packages/distributed/utils.py in f()
    330             if callback_timeout is not None:
    331                 future = asyncio.wait_for(future, callback_timeout)
--> 332             result[0] = yield future
    333         except Exception as exc:
    334             error[0] = sys.exc_info()

/opt/conda/lib/python3.6/site-packages/tornado/gen.py in run(self)
    733 
    734                     try:
--> 735                         value = future.result()
    736                     except Exception:
    737                         exc_info = sys.exc_info()

/opt/conda/lib/python3.6/site-packages/distributed/client.py in _start(self, timeout, **kwargs)
    990 
    991         try:
--> 992             await self._ensure_connected(timeout=timeout)
    993         except OSError:
    994             await self._close()

/opt/conda/lib/python3.6/site-packages/distributed/client.py in _ensure_connected(self, timeout)
   1047                 self.scheduler.address,
   1048                 timeout=timeout,
-> 1049                 connection_args=self.connection_args,
   1050             )
   1051             comm.name = "Client->Scheduler"

/opt/conda/lib/python3.6/site-packages/distributed/comm/core.py in connect(addr, timeout, deserialize, connection_args)
    241                 backoff = min(backoff, 1)  # wait at most one second
    242             else:
--> 243                 _raise(error)
    244         else:
    245             break

/opt/conda/lib/python3.6/site-packages/distributed/comm/core.py in _raise(error)
    211             error,
    212         )
--> 213         raise IOError(msg)
    214 
    215     backoff = 0.01

OSError: Timed out trying to connect to 'gateway://jupyter.example.com/dask-gateway.e809a46ff48742e1add7802ff563b0d2' after 10 s: Timed out trying to connect to 'gateway://jupyter.example.com/dask-gateway.e809a46ff48742e1add7802ff563b0d2' after 10 s: in <dask_gateway.comm.GatewayConnector object at 0x7f4827c9bb70>: ConnectionRefusedError: [Errno 111] Connection refused

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:8 (2 by maintainers)

github_iconTop GitHub Comments

3reactions
jcristcommented, Apr 13, 2020

Yeah, this can be a bit confusing - I think this is more likely a docs issue than an actual bug.

A user of dask-gateway communicates with the server(s) over a couple ways:

  • The Gateway client sends REST api calls over HTTP(S) to the api server. This is the address in the Gateway constructor.
  • The Gateway client initiates a client->dask-scheduler connection over TLS, using the gateway:// protocol (our custom client-scheduler proxy scheme). This is the proxy_address in the Gateway constructor.
  • The user may want to view the dask dashboards using their browser. This is the public_address, which is not currently configurable in the constructor (fixable), but is configurable with the dask configuration.

All user connections are intended to pass through traefik. Users should never directly interact with the api server or individual dask clusters.

By default, traefik is configured to run both the http:// and gateway:// protocols through the same port. Users only need to pass in the address to traefik and everything should work fine.

However, things that speak HTTP can have extra proxies in front of them - the JupyterHub proxy for example. In this case users would pass the proxied url as address to the Gateway constructor (e.g. https://jupyterhub-url/services/dask-gateway). But since the JupyterHub proxy doesn’t speak the gateway:// protocol (and thus can’t proxy client->scheduler connections), users will need to configure the proxy_address to point to the traefik proxy directly.

I think the following configuration should work for you:

gateway:
  address: https://jupyter.example.com/services/dask-gateway
  proxy-address: gateway://traefik-dask-gateway.dask-gateway:80

This will pass all api requests through the jupyterhub proxy. Dashboards will also be viewed through the JupyterHub proxy. Dask client connections will go directly through traefik, which is configured to run on port 80 by default.

Since your users will be interacting with dask-gateway from inside the k8s cluster, you could configure it to send api requests directly to traefik (skipping the JupyterHub proxy middleman). Dashboards would still be viewed through the JupyterHub proxy.

gateway:
  address: http://traefik-dask-gateway.dask-gateway/services/dask-gateway
  # the correct proxy_address should be automatically inferred from `address` above,
  # but could also be configured explicitly as
  # proxy-address: gateway://traefik-dask-gateway.dask-gateway:80
  public-address: https://jupyter.example.com/services/dask-gateway

This is the configuration I recommend when using with JupyterHub if all requests will be coming from inside the k8s cluster. In this case you could also change the type of the traefik service from LoadBalancer to ClusterIp.

Please let me know if the above works for you. I think we have the following TODOs from this issue:

  • Better document the different addresses, both generally and in a k8s context
  • Make public_address configurable in the Gateway constructor
0reactions
consideRatiocommented, Apr 14, 2020

I let this become an issue with too much diverging discussion.

I’ll close this an open a new, where I copy the relevant open discussion topics with regards on how to improve something further.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How the proxyAddresses attribute is populated in Azure AD
The proxyAddresses attribute in Active Directory is a multi-value property that can contain various known address entries.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found