question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Suggestion: provide a padding function for Dask arrays

See original GitHub issue

Hi team,

I’m currently working on a dataset of 3D images that will be fed to a neural network. The input arrays have varying sizes along all 3 axis for example: [390, 355, 355] [390, 414, 414] [398, 474, 474] [403, 474, 474] [412, 490, 490] [530, 490, 490]

All images would fit in a 530 x 490 x 490 array I would like to pad smaller images with 0, or even better the ‘edge’ value as in numy.pad, so they all have the same 530x490x490 shape.

I don’t see how to do that within dask without reverting to Numpy and using either:

  1. assignment (from https://stackoverflow.com/questions/35751306/python-how-to-pad-numpy-array-with-zeros)
def pad(array, reference_shape, offsets):
    """
    array: Array to be padded
    reference_shape: tuple of size of narray to create
    offsets: list of offsets (number of elements must be equal to the dimension of the array)
    will throw a ValueError if offsets is too big and the reference_shape cannot handle the offsets
    """

    # Create an array of zeros with the reference shape
    result = np.zeros(reference_shape)
    # Create a list of slices from offset to offset + shape in each dimension
    insertHere = [slice(offsets[dim], offsets[dim] + array.shape[dim]) for dim in range(array.ndim)]
    # Insert the array in the result at the specified offsets
    result[insertHere] = array
    return result
  1. or np.pad (see https://stackoverflow.com/questions/29218785/numpy-scale-3d-array)

I believe this is a very common scenario while preprocessing images for machine learning.

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
jakirkhamcommented, Mar 31, 2017

Also interested in this. It would be great to even have some reasonable subset of the functionality that numpy.pad provides in dask.array. In particular, the padding with zeros you have described. Are you working on something in this direction, @mratsim, or is anyone else?

0reactions
jakirkhamcommented, Jun 13, 2018

Guess I forgot to mention this. 😃

Have worked up an implementation of pad for Dask Arrays in PR ( https://github.com/dask/dask/pull/3578 ). This matches NumPy’s pad API reasonably well. Could be used very easily to pad with 0s or a variety of other useful things.

Would appreciate hearing feedback on it.

Read more comments on GitHub >

github_iconTop Results From Across the Web

dask.array.pad - Dask documentation
The padding function, if used, should modify a rank 1 array in-place. It has the following signature: padding_func(vector, iaxis_pad_width, iaxis, kwargs).
Read more >
Dask Array: Guide to Work with Large Arrays in Parallel [Python]
It provides many different sub-modules to perform parallel computing on a single computer or cluster of computers with ease. It let us work...
Read more >
Parallel computing with Dask - Xarray
Dask divides arrays into many small pieces, called chunks, each of which is presumed to be small enough to fit into memory. Unlike...
Read more >
NEP 35 — Array creation dispatching with __array_function ...
import numpy as np import cupy import dask.array as da from dask.array.utils import meta_from_array def my_dask_pad(arr, padding): padding ...
Read more >
Array programming with NumPy - Nature
Array programming provides a powerful, compact and expressive syntax ... In this example, NumPy's 'mean' function is called on a Dask array.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found