question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Explain or implement nested parallelism

See original GitHub issue

Current behavior

Please describe how the feature works today

I am unable to figure out how to use map to parallelize nested loops, requiring two levels of fan in/fan out. I’m not sure if this is a documentation or an implementation issue.

For example (pseudo-code):

array_d2 = task_1()
vec = task_2()
u = zeros(array.shape[0])
for i, row in array_d2:
    s = task_3(vec, row)
    t = zeros(array.shape[1])
    for j, cell in row:
        t[j] = task_4(vec, s, cell)
    u[i] = task_5(vec, s, t)
v = task_6(u)

I guess that I would start task_3, task_4 and task_5 mapped; I would pass vec wrapped in unmapped to task_3. and both vec and s in unmapped to task_4. Should I pass both vec and t unmapped to task_4?? Seems magical to use for both reduce and broadcast.

Related stack overflow issue

Proposed behavior

Please describe your proposed change to the current behavior

Perhaps for broadcast, use unmapped; for (partial) reduce (as t in task_5, used reduced). (The idea is that you read off the parallelism from any arguments that are neither unmapped or reduced. However, the reduced fan in has to trace the flow back to the fan-out to see how it relates to the marginal mapped arguments. For more complex situations, like 2D convolution, more explicit axis and slice specifications could be devised, but I’m not sure what the syntax should look like.)

Alternately, document current interface, however it deals with this, to make clear how to go about it. Even making clear how a low level interface works (if there is one) would be useful.

Thanks!

Example

Please give an example of how the enhancement would be useful

I am translating parallel DAGs specified in another system to use Prefect as a backend executor. Nested parallelism obviously comes up quite often in data science flows.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:8 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
jlowincommented, Jun 17, 2020

Got it! So maybe your “unit of work” is, in fact, this small, in which case you may need to use the slightly more “naive” approach of iterated reduce / map steps. Prefect’s ability to robustly manage states is due, in part, to our ability to know the shape of the DAG at compile time, with limited extensions at runtime (such as the cardinality of mapped tasks). That’s why it’s hard for Prefect to easily extend into more dynamic settings - each dimension of dynamicism requires some new form of DAG introspection in order to plan the state management. It’s definitely not impossible - witness map and flat_map 😃 - but it’s more involved than just passing the information to Dask. In the medium term, we will be adopting a backend that allows fully dynamic tasks that will be a perfect match for your use case – Prefect will govern each item and Dask will execute it, but we’re not quite there yet. In the meantime, happy to support whichever approach makes the most sense for your work and definitely let us know the design patterns that emerge - we’ll use them to improve this area of Prefect.

0reactions
shaunccommented, Jun 22, 2020

Thanks!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Chapter 4 Nested Parallelism (Sun Studio 12: OpenMP API ...
OpenMP parallel regions can be nested inside each other. If nested parallelism is disabled, then the new team created by a thread encountering...
Read more >
Nested Parallelism - an overview | ScienceDirect Topics
The most obvious and intuitive method of implementing nested parallelism within an OpenMP code is to simply nest two (or more) OpenMP parallel...
Read more >
Nested Parallelism
All the loops are parallel, but none is adequately large enough to employ all the threads. Further, the variability in execution time means...
Read more >
OpenMP: What is the benefit of nesting parallelizations?
Nested parallelism is for those cases where the parallelism isn't all exposed at once -- say you want to do 2 simultaneous function...
Read more >
Nested Parallelism - Intro to Parallel Programming - YouTube
This video is part of an online course, Intro to Parallel Programming. Check out the course here: https://www.udacity.com/course/cs344.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found