question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Asynchronous Disk Access in Workers

See original GitHub issue

Currently reading from or writing to disk blocks the event loop

https://github.com/dask/distributed/blob/74e8dc64ef0436147a88ba225604ed7f86b0d569/distributed/worker.py#L1948-L1964

This can cause workers to become unresponsive, especially in systems with very slow disk access. Ideally Disk I/O would happen concurrently. There are a couple of ways to do this.

Offload to separate thread

We could move all manipulation of the Worker.data MutableMapping to a separate thread, such as we do with the offload function, which we use today for deserialization.

However, if we do this then we need to do it for all access to Worker.data including seemingly innocuous checks like if key in self.data which may become annoying.

Handle Disk logic directly in the worker

We could also break apart the MutableMapping abstraction, and unpack the zict logic directly into the Worker code. This would allow us to keep a lot of the fast access in the event loop, while treating disk access specially. It would also open the door for more performance improvements, like trying to schedule tasks for data that is currently in memory rather than data that is currently on disk. In general if we want to improve out-of-memory handling in Dask we’ll eventually need to break this abstraction.

However, breaking this abstraction comes at considerable cost. First, it means that there is more to manage in a monolithic Worker codebase (zict has tricky logic that we haven’t really had to touch or maintain in years). Second, it means that we’ll have to find a way that still lets other groups like RAPIDS extend the storage hierarchy (they have device->host->disk rather than just host->disk).

cc @quasiben @pentschev @jrbourbeau @fjetter

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:27 (19 by maintainers)

github_iconTop GitHub Comments

1reaction
mrocklincommented, Mar 7, 2022

When Gabe and I agree, you know we’re on to something 🙂

1reaction
gjoseph92commented, Mar 3, 2022

Recent workloads I’ve run have made me think streaming spilled bytes straight from disk to other workers (transferring spilled data without either deserializing it, or loading all the bytes into memory) needs to be a core part of this design.

I made a simple workload that resulted in spilling lots of data to disk (basically df - df.mean(), where df was larger than the cluster). We all know spilling to disk is often the death knell of a cluster, so I expected this to blow up. But it actually ran great, because the tasks that needed the spilled data always ran on the workers where the data already had been spilled. No transfers were necessary. Additionally, the event loop being blocked by disk IO might not have been as big a deal, since the workers didn’t have much communication to do anyway (just with the scheduler).

same-worker-uses-spilled-data

Notice there are lots of disk reads, but few transfers, little white space—task stream looks great

Compare this to an (a @ a.T).mean(), which blows up the cluster and grinds to a halt. These tasks fundamentally require more inputs and therefore more memory, true. But in this case, data that’s been spilled on one worker needs to be transferred to another worker. Because doing this un-spills that data, the worker is playing memory whack-a-mole: it’s reading data back into memory for transfers about as fast as (maybe even faster than?) it’s dumping it to disk. And the blocking of the event loop is gumming up these data transfers to other workers, slowing things down even more.

spilled-data-transferred-to-different-worker

I’d just never seen spilling work well, like it did in the first example. Which made me realize that maybe the biggest problems aren’t so much with spilling, but with how spilling interacts with data transfer. So I think it’s essential that we make that a core part of any redesign.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Asynchronous disk I/O appears as synchronous on Windows
File I/O on Microsoft Windows can be synchronous or asynchronous. The default behavior for I/O is synchronous, where an I/O function is called ......
Read more >
Durable Objects - Workers - Cloudflare Docs
A Durable Object remains active until all asynchronous I/O, including Promises, within the Durable Object has resolved. This is true for all ...
Read more >
Asynchronous Access to HTTP Cookies - Chrome Developers
The Cookie Store API offers asynchronous access to HTTP cookies, and opens up the cookie jar to service workers.
Read more >
FileSystemSyncAccessHandle - Web APIs - MDN Web Docs
This class is only accessible inside dedicated Web Workers for files within the origin private file system.
Read more >
Web Workers in the Real World. Comlink simplifies ... - Medium
Asynchronous APIs in the Web are needed for accessing computer resources which maybe a little slower such as reading from the disk, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found