Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Explain or implement nested parallelism

See original GitHub issue

Current behavior

Please describe how the feature works today

I am unable to figure out how to use map to parallelize nested loops, requiring two levels of fan in/fan out. I’m not sure if this is a documentation or an implementation issue.

For example (pseudo-code):

array_d2 = task_1()
vec = task_2()
u = zeros(array.shape[0])
for i, row in array_d2:
    s = task_3(vec, row)
    t = zeros(array.shape[1])
    for j, cell in row:
        t[j] = task_4(vec, s, cell)
    u[i] = task_5(vec, s, t)
v = task_6(u)

I guess that I would start task_3, task_4 and task_5 mapped; I would pass vec wrapped in unmapped to task_3. and both vec and s in unmapped to task_4. Should I pass both vec and t unmapped to task_4?? Seems magical to use for both reduce and broadcast.

Related stack overflow issue

Proposed behavior

Please describe your proposed change to the current behavior

Perhaps for broadcast, use unmapped; for (partial) reduce (as t in task_5, used reduced). (The idea is that you read off the parallelism from any arguments that are neither unmapped or reduced. However, the reduced fan in has to trace the flow back to the fan-out to see how it relates to the marginal mapped arguments. For more complex situations, like 2D convolution, more explicit axis and slice specifications could be devised, but I’m not sure what the syntax should look like.)

Alternately, document current interface, however it deals with this, to make clear how to go about it. Even making clear how a low level interface works (if there is one) would be useful.

Thanks!

Example

Please give an example of how the enhancement would be useful

I am translating parallel DAGs specified in another system to use Prefect as a backend executor. Nested parallelism obviously comes up quite often in data science flows.

Issue Analytics

State:
Created 3 years ago
Comments:8 (3 by maintainers)

Top GitHub Comments

1reaction

jlowincommented, Jun 17, 2020

Got it! So maybe your “unit of work” is, in fact, this small, in which case you may need to use the slightly more “naive” approach of iterated reduce / map steps. Prefect’s ability to robustly manage states is due, in part, to our ability to know the shape of the DAG at compile time, with limited extensions at runtime (such as the cardinality of mapped tasks). That’s why it’s hard for Prefect to easily extend into more dynamic settings - each dimension of dynamicism requires some new form of DAG introspection in order to plan the state management. It’s definitely not impossible - witness map and flat_map 😃 - but it’s more involved than just passing the information to Dask. In the medium term, we will be adopting a backend that allows fully dynamic tasks that will be a perfect match for your use case – Prefect will govern each item and Dask will execute it, but we’re not quite there yet. In the meantime, happy to support whichever approach makes the most sense for your work and definitely let us know the design patterns that emerge - we’ll use them to improve this area of Prefect.

0reactions

shaunccommented, Jun 22, 2020

Thanks!