ENH: Faster array padding
See original GitHub issueDear devs,
As suggested here https://github.com/numpy/numpy/pull/11033#issuecomment-386039128 the current implementation of numpy.pad
uses copies more than necessary. Currently most of the pad modes use numpy.concatenate
under the hood to create the new array. This has to happen twice for each padded axis. I think it would be faster to pre-allocate the returned array once with the correct final shape and just set the appropriate edge values.
Here is a first draft of a function that would pre-allocate an array with padded shape and undefined content in the padded areas.
def _pad_empty(arr, pad_amt):
"""Pad array with undefined values.
Parameters
----------
arr : ndarray
Array to grow.
pad_amt : sequence of tuple[int, int]
Pad width on both sides for each dimension in `arr`.
Returns
-------
padded : ndarray
Larger array with undefined values in padded areas.
"""
# Allocate grown array
new_shape = tuple(s + sum(p) for s, p in zip(arr.shape, pad_amt))
padded = np.empty(new_shape, dtype=arr.dtype)
# Copy old array into correct space
old_area = tuple(
slice(None if left == 0 else left, None if right == 0 else -right)
for left, right in pad_amt
)
padded[old_area] = arr
return padded
These undefined pad-areas could then be filled by simple value assignment, e.g. with new _set_const_after
, _set_mean_before
… I think this would be significantly faster and I (kind of) tested this already with the suggested function _fast_pad
in https://github.com/scikit-image/scikit-image/pull/3022.
If you like this idea, I’d be happy to make a PR that addresses this after #11012 is resolved one way or another. I’m looking forward to your feedback.
Issue Analytics
- State:
- Created 5 years ago
- Comments:7 (7 by maintainers)
@eric-wieser Great!
It seems like there aren’t any benchmarks covering the
pad
function yet. So I think it would be useful to add these first in order actually measure speed improvements objectively.Here is a short benchmark that can be used to cover the
constant
boundary mode case. Whatever benchmark gets created should definitely cover the out-of-cache operations as well as those are likely to be quite costly.cc @jakirkham