question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DISCUSSION: map_overlap with reduction functions

See original GitHub issue

Today @jsignell @jni and I had a conversation about wanting to run map_overlap with a function that creates some kind of reduction.

cc @jakirkham

Background

We wanted to talk about how to make processing data with Dask a little more friendly at the reduction stage. Eg: after processing steps that don’t change the array shape (filters, etc.), then computing stuff per object (difficulty level, using slightly overlapping chunks). I feel like I’m doing a lot of weird workarounds when I do this, so that seems like a sign we could improve it.

Relevant:

Summary of our conversation today

  • We don’t just want to be restricted to numpy or pandas output from the inner function. Juan makes the case for an example that outputs networkx graphs (which can be joined with x.compose). Gen has previously run into issues with functions that output lists (in her case she could change the lists to numpy arrays, but depending on what’s in the lists that won’t always be possible).
  • Julia explained the difference between combine and aggregate, using an example of calculating the mean (thanks Julia!)
  • We talked about the differences between map_overlap, map_blocks and blockwise.
    • We’re not sure about the history of when map_blocks vs blockwise were introduced & changed/updated.
    • Blockwise is not commonly known about or used, especially among people in the life science community. Juan says when he read the docstring and saw details about einstein notation he assumed it wasn’t relevant to him.
  • We talked a lot about whether:
    • changes should be folded in to the existing map_overlap function, or
    • maybe a reduction_overlap function is a better fit, or
    • … something else? There wasn’t a clear outcome from this discussion, according to my notes (jump in if I’m wrong). Possibly the best way to do things will become more clear later on.

Links that were shared:

  • The concatenate keyword argument in reduce. Julia suggests maybe this keyword argument could be used to pass in a function (like the networkx compose function we talked about), rather than just being a simple boolean.

Plan going forward

1. Proof of concept / toy example (Gen & Juan)

We’ll use the example of a function that takes in an (overlapped) image chunk and returns a networkx graph. Juan suggests we could hack the output we want by: * Writing a small toy example with networkx output * Preferred: adapt the example from Elegant SciPy chapter 3 * Alternatively, we could adapt the scikit-image region adjacency graph example * Expand the chunks to get some overlap map_overlap given the identity function with trim=False * Using map_blocks or 'reduction` to get the final desired result

Note: now I think about this more, I’m pretty sure that one chunk plus some overlap does not output one dask array chunk but bigger - I think we actually get lots of small edge chunk pieces tacked on the outside. So perhaps we’ll need a rechunking operation in the middle to make this happen.

2. Integration with the dask library (Gen, Julia & John, probably)

The specific details of this will depend on what we find works best with our proof of concept.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:16 (16 by maintainers)

github_iconTop GitHub Comments

1reaction
jakirkhamcommented, Jun 8, 2021

If we do document some pattern, maybe it should live here or perhaps on a subsequent related page

1reaction
mrocklincommented, Jun 8, 2021

da.overlap.overlap_internal does what you’re doing in that line. There are other functions, like trim, that could also be made more visible

On Tue, Jun 8, 2021 at 10:33 AM Julia Signell @.***> wrote:

Or, if the first line is unpleasant, then maybe we want to break open the overlap module a bit more so that some of the component functions are more accessible?

Yeah I think that may be it - the first line feels like an antipattern. There probably already is a better way to just get chunks with the overlap from neighbors, but if not, that could be a nice approach.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dask/dask/issues/7772#issuecomment-856875457, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKZTECFFSQPOT4XTMCTSDTRYZ6BANCNFSM46I5KNTA .

Read more comments on GitHub >

github_iconTop Results From Across the Web

Introduction & Tutorial - SkePU
Reduce. • MapReduce. • Scan. • MapOverlap ... Demonstration, Outlook, Discussion… ... Skeleton Programming :: User Functions.
Read more >
SkePU 3: Portable High-Level Programming of ... - Springer Link
Its main new features include new skeletons, new data co. ... API as element-wise inputs to Map, Reduce, MapReduce, Scan, and MapOverlap.
Read more >
A Reduction-Based Exact Algorithm for the Contact Map Overlap ...
In this paper, we develop a reduction-based exact algorithm for the CMO problem. Our approach solves CMO directly rather than after transformation to...
Read more >
exa2pro-eocoe workshop - SkePU tutorial - Indico
Skeletons in depth. • Map. • Reduce. • MapReduce. • Scan. • MapOverlap. • MapPairs + MapPairsReduce. • Demonstration, Outlook, Discussion…
Read more >
TDDD56 Multicore and GPU computing Lab 3 - IDA.LiU.se
reduce are examples of skeletons. The function objects are called user functions. For example, the map operation can accept the data set [1, ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found