question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Flattening Dask Arrays without spreading out chunks

See original GitHub issue

Currently functions like ravel and flatten follow the same traversal that one would expect when using NumPy. This makes sense for users coming from NumPy, who would expect this. Also there are some Dask Array functions that rely on this behavior to match their NumPy counterparts. However this means that ravel and flatten must spread out the chunks (rechunk the array), which can have a non-negligible performance cost associated.

That said, not all cases rely on matching the same order that NumPy would provide when flattening out an array. For cases like this, a nice alternative would be to flatten chunks themselves and merely stitch together the flattened chunks into a new 1-D array. This strategy would require no rechunking and would be embarrassingly parallel. Thus it would avoid the performance penalties that ravel and flatten have today.

Assuming this strategy is reasonable, there are a few ways we could go about implementing it.

  1. Allow some additional options to the order parameter to handle this need.
  2. Add a new parameter to toggle NumPy or chunk-based traversal strategies.
  3. Include a config option to enable this behavior for Dask Arrays more generally.
  4. Add a new function entirely for this behavior.
  5. ?

Thoughts on this?

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:16 (16 by maintainers)

github_iconTop GitHub Comments

2reactions
jakirkhamcommented, May 31, 2019

Sounds like the consensus is 4, correct? Please either +1 or -1 this comment if that is either correct or incorrect.

0reactions
GenevieveBuckleycommented, Mar 21, 2021

@dask/triage can we add a “good first issue” label here?

We have some Summer of Code potential applicants who are looking for more labelled issues right now.

Read more comments on GitHub >

github_iconTop Results From Across the Web

dask.array.reshape - Dask documentation
Source code for dask.array.reshape ... If no limit is provided, it defaults to using the ``array.chunk-size`` Dask config value.
Read more >
Chunks - Dask documentation
We always specify a chunks argument to tell dask.array how to break up the underlying array into chunks. We can specify chunks in...
Read more >
Source code for dask.array.routines
Array Input data; the histogram is computed over the flattened array. If the ``weights`` argument is used, the chunks of ``a`` are accessed...
Read more >
dask.array.core - Dask documentation
Map_blocks aligns blocks by block positions without regard to shape. ... for c in out.chunks)) } block_id_array = Array( block_id_dsk, block_id_name, ...
Read more >
dask.array.Array.flatten - Dask documentation
dask.array.Array.flatten¶ ... Return a flattened array. Refer to dask.array.ravel() for full documentation. See also. dask.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found