Client.gather not awaiting futures created with Client.run
See original GitHub issueHi, I’m struggling with running async functions concurrently using dask distributed. I’m attempted to use client.run
to launch some tasks on dedicated workers in conjunction with client.gather
to retrieve the results. As far as I can tell from reading the docs, my approach should be correct hence I am raising it as an issue here; however, I may be missing something, in which case the docs could potentially be improved.
For context, I’m building an application in which user defined classes represent nodes within a process graph (think manufacturing plant etc). The nodes execute bespoke code and communicate data via channels (e.g. dask.distributed.Queue). Nodes in the graph may have a large memory footprint (i.e. they contain trained machine learning models). Each node should execute all of its iterations on a single worker until it receives a termination signal. To satisfy this requirement I am using client.run
and specifying a single worker, assigning workers to nodes in a round robin fashion. I realise this pattern may not be ideal and is perhaps a bit of a hack; I’m currently exploring how to implement this.
I have created a minimal example which follows the same pattern as my actual application code and reproduces the same issue.
What happened:
I create a list of futures by calling client.run
in a loop and passing different arguments to a function targetted to execute on specific workers. I subsequently call client.gather
to get back the results from this set of futures. Instead of waiting for the functions to execute, control continues past the client.gather
call and the application exits with the below exception.
/usr/lib/python3.7/asyncio/events.py:88: RuntimeWarning: coroutine 'Client._run' was never awaited
self._context.run(self._callback, *self._args)
If I add in a call to dask.distributed.wait(futures)
before the call to Client.gather
then exactly the same behaviour is observed.
What you expected to happen:
I expect that calling Client.gather
will wait for all the futures to execute and return the results from the futures rather than just returning the futures themselves. Additionaly, I expect that if I call dask.distributed.wait
on the list of futures, that all the futures passed in will be awaited.
Minimal Complete Verifiable Example:
import asyncio
from itertools import cycle
import time
from dask.distributed import Client, wait
SLEEP_TIME = 2.0 # Time for coroutine to sleep in seconds
async def foo(x: int, sleep_time: float = SLEEP_TIME) -> int:
"""Sleeps then returns the input value."""
print(f"Got {x}. Sleeping for {sleep_time}s.")
await asyncio.sleep(sleep_time)
print(f"Done for {x}!")
return x
def bar(x: int, sleep_time: float = SLEEP_TIME) -> int:
"""Sleeps then returns the input value (blocking version)."""
print(f"Got {x}. Sleeping for {sleep_time}s.")
time.sleep(sleep_time)
print(f"Done for {x}!")
return x
async def main() -> None:
"""Entry point for dask run."""
# Create an async client using the local machine as a cluster.
client = await Client(asynchronous=True)
# Get the list of workers from the scheduler.
workers = cycle(client.scheduler_info()["workers"])
t_start = time.time()
# Assign the functions to workers in round robin fashion.
futures = [client.run(foo, i, workers=[next(workers)]) for i in range(3)]
# futures = [client.run(bar, i, workers=[next(workers)]) for i in range(3)]
# Await all the futures using gather.
# wait(futures) # Explicitely waiting for all the futures makes no difference.
# NOTE : Futures objects are not awaited when calling `client.gather`.
results = await client.gather(futures)
# NOTE : Using `asyncio.gather` awaits the futures as expected.
# results = await asyncio.gather(*futures)
# Disply the collected results.
print(f"results: {results}")
print(f"Execution took {time.time() - t_start}s.")
# Close the client connection.
await client.close()
if __name__ == "__main__":
asyncio.get_event_loop().run_until_complete(main())
Anything else we should know
- If the call to
Client.gather
is replaced withasyncio.gather
then the expected behaviour is observed. - Replacing the async function
foo
with the blocking functionbar
gives the same results.
Environment:
- Dask version: 2021.6.0
- Python version: 3.7.10
- Operating System: Ubuntu 20.04
- Install method (conda, pip, source): pip
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (4 by maintainers)
Top GitHub Comments
@fjetter thanks for that suggestion with your example above. That approach hadn’t occurred to me. I will try it out today. I’d like to try it with the k8s deployment of dask but I’m finding that when I deploy the default helm chart on my machine the workers cannot discover the scheduler for some reason. Will open a separate issue on that front if I can’t figure out the problem.
In terms of the pubsub stuff I did notice that somewhere. At the moment Queue fits nicely for us behind the abstract interface we’ve defined with our
Channel
class and seems in principle that it should do the job - if I can get all this stuff working nicely.This should be closed via https://github.com/dask/distributed/pull/5151. @chrisk314 feel free to re-open if that’s not the case