Multiple ThreadPoolExecutors
See original GitHub issue(I think that I’ve raised this before, but I couldn’t find it. I suspect that it was part of commentary on an issue rather than a standalone issue itself)
Today we run all tasks in a ThreadPoolExecutor living at Worker.executor
. We default the size of this executor to the number of logical CPU cores on a machine. This works great most of the time, but there are some cases where we would like something different.
- I/O related tasks we could consider running on the event loop itself, or with a separate Tornado based AsyncExecutor
- For GPU related tasks we would prefer to have a separate executor with a single thread (or in the near future a few threads)
- For noxious tasks that leak memory folks have asked for a separate ProcessPoolExecutor
- Some folks have asked for a special executor for restricted resource tasks
- Actors run today on their own executor
In practice, the GPU pool is probably the most common case today.
So perhaps we should encode multiple executors into the Worker, and have tasks split between them based on annotations/resources/gpu flags.
executor = self.executors[task.executor or "cpu"]
self.submit_on_executor(executor, task, *args, **kwargs)
cc @dask/gpu
Issue Analytics
- State:
- Created 2 years ago
- Comments:14 (13 by maintainers)
Top Results From Across the Web
java - What happens when a single program has multiple ...
A ThreadPoolExecutor instance manages threads: it is responsible for ... that tells you to create multiple executors for some scenario.
Read more >ThreadPoolExecutor in Python: The Complete Guide
This is helpful if you want to perform waiting operations across multiple thread pools that are executing different types of tasks. Both ...
Read more >concurrent.futures — Launching parallel tasks — Python 3.11 ...
The asynchronous execution can be performed with threads, using ThreadPoolExecutor , or separate processes, using ProcessPoolExecutor .
Read more >Python ThreadPoolExecutor By Practical Examples
A thread pool is a pattern for managing multiple threads efficiently. Use ThreadPoolExecutor class to manage a thread pool in Python. Call the...
Read more >Java Thread Pools and ThreadPoolExecutor - HowToDoInJava
Lets look at a very basic example of thread pool executor in java and learn ... a given task at a single point...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@jakirkham : that’s already the case, and you could set the fsspec backend’s loop to be the one it needs to be; but zarr will still do its part decoding synchronously. You’d have to pass the filters down to the storage layer and replicate the work there - but then it would no longer be pure IO.
I think a rewrite in which we can fetch multiple blocks of bytes in a single task and pass to a separate dataframe-making task (without concat!) would work well for CSV. Parquet and just about anything else where we don’t pass bytes around is more complicated. Fastparquet, for example, isn’t interested in running in multiple threads like arrow can because “dask can solve that case” (not that it does a good job of releasing the GIL).
Note that the PR I linked above for fastparquet improved dataset open time by 10x for on s3 and without _metadata (one of the test datasets with many files).
Yeah, to be clear, I’m saying that if we were to change how dask collections handle IO, by moving
read_bytes
calls into fully separable tasks, then we could take advantage of this. You had mentioned this in the past I think.It wouldn’t work for Zarr, you’re right, because that abstraction hides I/O from us, but it could work for Parquet, CVS, and others if we wanted to make that explicit split. I’m not suggesting that we do this today, or any time in the moderate future.