Explain or implement nested parallelism
See original GitHub issueCurrent behavior
Please describe how the feature works today
I am unable to figure out how to use map
to parallelize nested loops, requiring two levels of fan in/fan out. I’m not sure if this is a documentation or an implementation issue.
For example (pseudo-code):
array_d2 = task_1()
vec = task_2()
u = zeros(array.shape[0])
for i, row in array_d2:
s = task_3(vec, row)
t = zeros(array.shape[1])
for j, cell in row:
t[j] = task_4(vec, s, cell)
u[i] = task_5(vec, s, t)
v = task_6(u)
I guess that I would start task_3
, task_4
and task_5
mapped; I would pass vec
wrapped in unmapped to task_3
. and both vec
and s
in unmapped to task_4
. Should I pass both vec
and t
unmapped to task_4
?? Seems magical to use for both reduce and broadcast.
Proposed behavior
Please describe your proposed change to the current behavior
Perhaps for broadcast, use unmapped
; for (partial) reduce (as t
in task_5
, used reduced
). (The idea is that you read off the parallelism from any arguments that are neither unmapped
or reduced
. However, the reduced
fan in has to trace the flow back to the fan-out to see how it relates to the marginal mapped arguments. For more complex situations, like 2D convolution, more explicit axis and slice specifications could be devised, but I’m not sure what the syntax should look like.)
Alternately, document current interface, however it deals with this, to make clear how to go about it. Even making clear how a low level interface works (if there is one) would be useful.
Thanks!
Example
Please give an example of how the enhancement would be useful
I am translating parallel DAGs specified in another system to use Prefect as a backend executor. Nested parallelism obviously comes up quite often in data science flows.
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (3 by maintainers)
Got it! So maybe your “unit of work” is, in fact, this small, in which case you may need to use the slightly more “naive” approach of iterated reduce / map steps. Prefect’s ability to robustly manage states is due, in part, to our ability to know the shape of the DAG at compile time, with limited extensions at runtime (such as the cardinality of mapped tasks). That’s why it’s hard for Prefect to easily extend into more dynamic settings - each dimension of dynamicism requires some new form of DAG introspection in order to plan the state management. It’s definitely not impossible - witness map and flat_map 😃 - but it’s more involved than just passing the information to Dask. In the medium term, we will be adopting a backend that allows fully dynamic tasks that will be a perfect match for your use case – Prefect will govern each item and Dask will execute it, but we’re not quite there yet. In the meantime, happy to support whichever approach makes the most sense for your work and definitely let us know the design patterns that emerge - we’ll use them to improve this area of Prefect.
Thanks!