Add distributed proximity distance grids using Dask
See original GitHub issueGiven a 2D raster image, a set of target pixels in the input raster, the proximity function computes a raster of proximity to indicate the distance from each pixel to the nearest target pixel. The distance metric to be used can be one of the following: Euclidean, Great-Circle, or Manhattan. In the most naive way, we can calculate distances from each pixel to all target pixels and find the closest target. This would take m*n*t
calculations, where m*n
is the size of the raster, and t
is the number of targets.
The current implementation of xrspatial.proximity function is ported from GDAL with some modifications to make it work with xarray.DataArray. This notebook shows how to use the function. To keep it simple, let’s consider the problem at a 2D array level instead of a 2D xarray DataArray. The algorithm can be described as follows:
Inputs:
- Raster image
I
, aheight x width
2D array - Set of target pixels
T
. All target pixels are inI
- Distance metric
d()
Output: Proximity raster P
where P[i, j]
is the distance from cell (i, j)
to its nearest target pixel
Idea: Use dynamic programming to identify the nearest target pixel of a pixel based on the nearest target pixels of its 3x3
neighborhood window.
Detail implementation:
- Let
Nx
be a 1d array ofwidth
elements:Nx[j]
is the x-position in pixel space of nearest target pixel of pixel(i, j)
.Nx[j]
can have a value in[0, width-1]
- Let
Ny
be a 1d array ofwidth
elements:Ny[j]
is the y-position in pixel space of nearest target pixel of pixel(i, j)
.Ny[j]
can have a value in[0, height-1]
. Values ofNx
andNy
will be updated row by row.
- Initially,
- set
Nx[j] = -1
for allj
. - set
Ny[j] = -1
for allj
. - for all
(i, j)
, setP[i, j] = 0
if cell(i, j)
is a target pixel,P[i, j] = infinity
otherwise. - note that distance
d( (i1, j1), (i2, j2) ) = infinity
if any of the inputsi1, j1, i2, j2
equals to -1 (i.e, invalid pixel)
- Traverse the image row by row from top to bottom:
- traverse each row from left to right
- traverse each row from right to left
- Reset
Nx[j] = -1
andNy[j] = -1
for allj
. - Traverse the image in reverse order from bottom to top:
- traverse each row from right to left
- traverse each row from left to right
The formula to update P[i, j]
and Ny[j]
and Nx[j]
at each cell (i, j)
P[i, j] = min(
P[i, j],
d( (Ny[j], Nx[j]), (i, j) ), # Are we nearer to the closest target to the above (below) pixel?
d( (Ny[j-1], Nx[j-1]), (i, j) ), # Are we nearer to the closest target to the left (right) pixel?
d( (Ny[j+1], Nx[j+1]), (i, j) ), # Are we nearer to the closest target to the top right (bottom left) pixel?
)
Update Ny[j]
and Nx[j]
accordingly:
Ny[j], Nx[j] = i, j
ifP[i, j] = 0
Ny[j], Nx[j] = Ny[j-1], Nx[j-1]
ifP[i, j]
is updated asd( (Ny[j-1], Nx[j-1]), (i, j) )
Ny[j], Nx[j] = Ny[j+1], Nx[j+1]
ifP[i, j]
is updated asd( (Ny[j+1], Nx[j+1]), (i, j) )
We’re looking for an implementation for the distributed version of the Proximity function that works with Dask. Explicit questions are listed as:
- Currently, the calculations are performed sequentially, how to parallelize them?
- Dividing our 2D input data into a set of smaller chunks, how to compute proximity distance grid chunk by chunk? The primary question is, how to determine the nearest target pixel of all pixels in a chunk when the target pixels can be outside the chunk?
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (3 by maintainers)
Top GitHub Comments
https://nbviewer.jupyter.org/gist/TomAugspurger/78b13fe480c6f427b074f6148ec08637 has a (rough) sketch of what we talked about this morning. It
dask_ml.metrics.pairwise_distances_argmin_min
sklearn.neighbors.KDTree
+ map_blocks on a Dask Array@TomAugspurger very helpful thanks. We will run with these ideas and report back.