DISCUSSION: map_overlap with reduction functions
See original GitHub issueToday @jsignell @jni and I had a conversation about wanting to run map_overlap with a function that creates some kind of reduction.
cc @jakirkham
Background
We wanted to talk about how to make processing data with Dask a little more friendly at the reduction stage. Eg: after processing steps that don’t change the array shape (filters, etc.), then computing stuff per object (difficulty level, using slightly overlapping chunks). I feel like I’m doing a lot of weird workarounds when I do this, so that seems like a sign we could improve it.
Relevant:
- dask issue: “Applying a function to each chunk in a Dask array and combining the results”
 - gist: Genevieve’s distributed skeleton analysis
 - gist: Gen’s slices_from_chunks_overlap function
 - dask PR (closed): “Add blockwise example with ragged size outputs”
 
Summary of our conversation today
- We don’t just want to be restricted to numpy or pandas output from the inner function. Juan makes the case for an example that outputs networkx graphs (which can be joined with 
x.compose). Gen has previously run into issues with functions that output lists (in her case she could change the lists to numpy arrays, but depending on what’s in the lists that won’t always be possible). - Julia explained the difference between 
combineandaggregate, using an example of calculating the mean (thanks Julia!) - We talked about the differences between 
map_overlap,map_blocksandblockwise.- We’re not sure about the history of when 
map_blocksvsblockwisewere introduced & changed/updated. - Blockwise is not commonly known about or used, especially among people in the life science community. Juan says when he read the docstring and saw details about einstein notation he assumed it wasn’t relevant to him.
 
 - We’re not sure about the history of when 
 - We talked a lot about whether:
- changes should be folded in to the existing 
map_overlapfunction, or - maybe a 
reduction_overlapfunction is a better fit, or - … something else? There wasn’t a clear outcome from this discussion, according to my notes (jump in if I’m wrong). Possibly the best way to do things will become more clear later on.
 
 - changes should be folded in to the existing 
 
Links that were shared:
- The concatenate keyword argument in reduce. Julia suggests maybe this keyword argument could be used to pass in a function (like the networkx compose function we talked about), rather than just being a simple boolean.
 
Plan going forward
1. Proof of concept / toy example (Gen & Juan)
We’ll use the example of a function that takes in an (overlapped) image chunk and returns a networkx graph. Juan suggests we could hack the output we want by:
* Writing a small toy example with networkx output
* Preferred: adapt the example from Elegant SciPy chapter 3
* Alternatively, we could adapt the scikit-image region adjacency graph example
* Expand the chunks to get some overlap map_overlap given the identity function with trim=False
* Using map_blocks or 'reduction` to get the final desired result
Note: now I think about this more, I’m pretty sure that one chunk plus some overlap does not output one dask array chunk but bigger - I think we actually get lots of small edge chunk pieces tacked on the outside. So perhaps we’ll need a rechunking operation in the middle to make this happen.
2. Integration with the dask library (Gen, Julia & John, probably)
The specific details of this will depend on what we find works best with our proof of concept.
Issue Analytics
- State:
 - Created 2 years ago
 - Comments:16 (16 by maintainers)
 

Top Related StackOverflow Question
If we do document some pattern, maybe it should live here or perhaps on a subsequent related page
da.overlap.overlap_internal does what you’re doing in that line. There are other functions, like trim, that could also be made more visible
On Tue, Jun 8, 2021 at 10:33 AM Julia Signell @.***> wrote: