Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Non-blocking operations on scheduler

See original GitHub issue

Hi, this is just a general discussion on the scheduler’s behavior. I noticed that in initialize:

async def run_scheduler():
            async with Scheduler(
                interface=interface,
                protocol=protocol,
                dashboard=dashboard,
                dashboard_address=dashboard_address,
            ) as scheduler:
                comm.bcast(scheduler.address, root=0)
                comm.Barrier()
                await scheduler.finished()

The scheduler only starts with 1 thread, and since the scheduler has to maintain communication with all workers and client, I’m very curious that once there is an await call, then the schedule will simply block itself there, and only when after that await, it can start to do other communications. It seems this is a huge overhead in communication.

For example, in distributed.comm.ucx,

@log_errors
    async def write(
        self,
        msg: dict,
        serializers: Collection[str] | None = None,
        on_error: str = "message",
    ) -> int:
        if self.closed():
            raise CommClosedError("Endpoint is closed -- unable to send message")
        try:
            if serializers is None:
                serializers = ("cuda", "dask", "pickle", "error")
            # msg can also be a list of dicts when sending batched messages
            logging.info("send msg={}".format(msg))
            frames = await to_frames(
                msg,
                serializers=serializers,
                on_error=on_error,
                allow_offload=self.allow_offload,
            )
            nframes = len(frames)
            cuda_frames = tuple(hasattr(f, "__cuda_array_interface__") for f in frames)
            sizes = tuple(nbytes(f) for f in frames)
            cuda_send_frames, send_frames = zip(
                *(
                    (is_cuda, each_frame)
                    for is_cuda, each_frame in zip(cuda_frames, frames)
                    if nbytes(each_frame) > 0
                )
            )

            # Send meta data

            # Send close flag and number of frames (_Bool, int64)
            await self.ep.send(struct.pack("?Q", False, nframes))
            # Send which frames are CUDA (bool) and
            # how large each frame is (uint64)
            await self.ep.send(
                struct.pack(nframes * "?" + nframes * "Q", *cuda_frames, *sizes)
            )

            # Send frames

            # It is necessary to first synchronize the default stream before start
            # sending We synchronize the default stream because UCX is not
            # stream-ordered and syncing the default stream will wait for other
            # non-blocking CUDA streams. Note this is only sufficient if the memory
            # being sent is not currently in use on non-blocking CUDA streams.
            if any(cuda_send_frames):
                synchronize_stream(0)

            for each_frame in send_frames:
                await self.ep.send(each_frame)
            return sum(sizes)
        except (ucp.exceptions.UCXBaseException):
            self.abort()
            raise CommClosedError("While writing, the connection was closed")

There are some await self.ep.send, and for example, this is a send from scheduler to worker dask/dask-mpi#1, then despite that all other workers can perform computation in parallel, they still have to sequentially wait for the communication with the scheduler. And in cases where communication is heavier than computation, the overhead will be significant.

I’m wondering if there is any way to perform nonblocking send/recv by giving the scheduler more threads.

Issue Analytics

State:
Created a year ago
Comments:7 (4 by maintainers)

Top GitHub Comments

1reaction

jacobtomlinsoncommented, Nov 1, 2022

Agreed, transferring this issue to distributed.

1reaction

kmpaulcommented, Oct 29, 2022

This seems to be a question more for the Distributed community. Dask-MPI is just the tool for launching a Dask cluster.

Top Results From Across the Web

Get results of scheduled non-blocking operations in Java

I am trying to do some blocking operations (say HTTP request) in a scheduled and non-blocking manner. Let's say I have 10 requests...

How does non-blocking IO work under the hood? - Medium

The thread scheduler will put the thread on a CPU when one is available. ... The main benefit of non-blocking IO is that...

support non-blocking scheduled methods - quarkusio/quarkus

Currently, a scheduled method is always invoked on a thread from the core thread pool. As a result, you can't e.g. use Hibernate...

NonBlocking (reactor-core 3.5.1)

A marker interface that is detected on Threads while executing Reactor blocking APIs, resulting in these calls throwing an exception. See Schedulers.

class Fiber - RDoc Documentation

There is also Fiber.schedule method, which is expected to immediately perform passed block in a non-blocking manner (but its actual implementation is up...