Suggestion: provide a padding function for Dask arrays
See original GitHub issueHi team,
I’m currently working on a dataset of 3D images that will be fed to a neural network. The input arrays have varying sizes along all 3 axis for example: [390, 355, 355] [390, 414, 414] [398, 474, 474] [403, 474, 474] [412, 490, 490] [530, 490, 490]
All images would fit in a 530 x 490 x 490 array I would like to pad smaller images with 0, or even better the ‘edge’ value as in numy.pad, so they all have the same 530x490x490 shape.
I don’t see how to do that within dask without reverting to Numpy and using either:
- assignment (from https://stackoverflow.com/questions/35751306/python-how-to-pad-numpy-array-with-zeros)
def pad(array, reference_shape, offsets):
"""
array: Array to be padded
reference_shape: tuple of size of narray to create
offsets: list of offsets (number of elements must be equal to the dimension of the array)
will throw a ValueError if offsets is too big and the reference_shape cannot handle the offsets
"""
# Create an array of zeros with the reference shape
result = np.zeros(reference_shape)
# Create a list of slices from offset to offset + shape in each dimension
insertHere = [slice(offsets[dim], offsets[dim] + array.shape[dim]) for dim in range(array.ndim)]
# Insert the array in the result at the specified offsets
result[insertHere] = array
return result
I believe this is a very common scenario while preprocessing images for machine learning.
Issue Analytics
- State:
- Created 7 years ago
- Comments:5 (4 by maintainers)
Top Results From Across the Web
dask.array.pad - Dask documentation
The padding function, if used, should modify a rank 1 array in-place. It has the following signature: padding_func(vector, iaxis_pad_width, iaxis, kwargs).
Read more >Dask Array: Guide to Work with Large Arrays in Parallel [Python]
It provides many different sub-modules to perform parallel computing on a single computer or cluster of computers with ease. It let us work...
Read more >Parallel computing with Dask - Xarray
Dask divides arrays into many small pieces, called chunks, each of which is presumed to be small enough to fit into memory. Unlike...
Read more >NEP 35 — Array creation dispatching with __array_function ...
import numpy as np import cupy import dask.array as da from dask.array.utils import meta_from_array def my_dask_pad(arr, padding): padding ...
Read more >Array programming with NumPy - Nature
Array programming provides a powerful, compact and expressive syntax ... In this example, NumPy's 'mean' function is called on a Dask array.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Also interested in this. It would be great to even have some reasonable subset of the functionality that
numpy.pad
provides indask.array
. In particular, the padding with zeros you have described. Are you working on something in this direction, @mratsim, or is anyone else?Guess I forgot to mention this. 😃
Have worked up an implementation of
pad
for Dask Arrays in PR ( https://github.com/dask/dask/pull/3578 ). This matches NumPy’spad
API reasonably well. Could be used very easily to pad with0
s or a variety of other useful things.Would appreciate hearing feedback on it.