question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Issue warning when map_blocks() function with axis arguments conflicts with know dask array chunk structure

See original GitHub issue

Hello, I recently ran into this issue and wanted to suggest issuing a warning when mapping a function onto a dask array when the mapped function arguments could yield unexpected/undesirable behavior in relation to the known chunk structure of the array. I provide an example below.

Minimal example: I want to horizontally stack multiple 1-d dask arrays and argsort them along their columns.

import numpy as np
import dask
import dask.array as da

# column vectors
array1 = da.from_array(np.array([5, 9, 1, 0]).reshape((-1, 1)))
array2 = da.from_array(np.array([12, -9, 15, 0]).reshape((-1, 1)))
array3 = da.from_array(np.array([90, -3, 3, 16]).reshape((-1, 1)))

# horizontally stack
combined_array = da.hstack([array1, array2, array3])

# argsort
combined_array.map_blocks(np.argsort, axis=1).compute()

Unexpected/undesired output:

array([[0, 0, 0],
       [0, 0, 0],
       [0, 0, 0],
       [0, 0, 0]])

This code results in unexpected and undesirable output. The hstacked array remains chunked along the columns, causing the mapped argsort along axis=1 to return all zeros:

hstacked_array_chunks

I resolved the issue by rechunking the stacked array so that each row was part of the same chunk:

combined_array = combined_array.rechunk({1: combined_array.shape[1]}) # ensure row contents are part of the same chunk
combined_array.map_blocks(np.argsort, axis=1).compute()

Desired output:

array([[0, 1, 2],
       [1, 2, 0],
       [0, 2, 1],
       [0, 1, 2]])

Suggestion: To help prevent unexpected and undesirable results, it may be worth alerting the user if the arguments to their mapping function (axis=1 in this case) conflict with the known chunk structure of their array. What do you think?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
Madhu94commented, Mar 19, 2021

I think this is good to close since in the linked PR, it was decided this warning may not be needed.

0reactions
jrbourbeaucommented, Mar 20, 2021

Thanks for following up here @Madhu94

Read more comments on GitHub >

github_iconTop Results From Across the Web

dask.array.map_blocks - Dask documentation
Note that map_blocks() will concatenate chunks along axes specified by the keyword parameter drop_axis prior to applying the function.
Read more >
API - Dask documentation
Apply a function repeatedly over multiple axes. arange (*args[, chunks, like, dtype]). Return evenly spaced values from start to stop with step size...
Read more >
dask.array.core - Dask documentation
Array Register that a function implements the API of a NumPy function (or ... chunks=3) >>> x.map_blocks(lambda x: x * 2).compute() array([ 0,...
Read more >
Overlapping Computations - Dask documentation
If depth is larger than any chunk along a particular axis, then the array is rechunked. Note that this function will attempt to...
Read more >
Source code for dask.array.routines
from __future__ import annotations import math import warnings from ... Parameters ---------- m : array_like Input array. axis : None or int or...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found