Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add distributed proximity distance grids using Dask

See original GitHub issue

Given a 2D raster image, a set of target pixels in the input raster, the proximity function computes a raster of proximity to indicate the distance from each pixel to the nearest target pixel. The distance metric to be used can be one of the following: Euclidean, Great-Circle, or Manhattan. In the most naive way, we can calculate distances from each pixel to all target pixels and find the closest target. This would take m*n*t calculations, where m*n is the size of the raster, and t is the number of targets.

The current implementation of xrspatial.proximity function is ported from GDAL with some modifications to make it work with xarray.DataArray. This notebook shows how to use the function. To keep it simple, let’s consider the problem at a 2D array level instead of a 2D xarray DataArray. The algorithm can be described as follows:

Inputs:

Raster image I, a height x width 2D array
Set of target pixels T. All target pixels are in I
Distance metric d()

Output: Proximity raster P where P[i, j] is the distance from cell (i, j) to its nearest target pixel

Idea: Use dynamic programming to identify the nearest target pixel of a pixel based on the nearest target pixels of its 3x3 neighborhood window.

Detail implementation:

Let Nx be a 1d array of width elements: Nx[j] is the x-position in pixel space of nearest target pixel of pixel (i, j). Nx[j] can have a value in [0, width-1]
Let Ny be a 1d array of width elements: Ny[j] is the y-position in pixel space of nearest target pixel of pixel (i, j). Ny[j] can have a value in [0, height-1]. Values of Nx and Ny will be updated row by row.

Initially,

set Nx[j] = -1 for all j.
set Ny[j] = -1 for all j.
for all (i, j), set P[i, j] = 0 if cell (i, j) is a target pixel, P[i, j] = infinity otherwise.
note that distance d( (i1, j1), (i2, j2) ) = infinity if any of the inputs i1, j1, i2, j2 equals to -1 (i.e, invalid pixel)

Traverse the image row by row from top to bottom:

traverse each row from left to right
traverse each row from right to left

Reset Nx[j] = -1 and Ny[j] = -1 for all j.
Traverse the image in reverse order from bottom to top:

traverse each row from right to left
traverse each row from left to right

The formula to update P[i, j] and Ny[j] and Nx[j] at each cell (i, j)

P[i, j] = min(
    P[i, j],                   
    d( (Ny[j], Nx[j]), (i, j) ),     # Are we nearer to the closest target to the above (below) pixel?
    d( (Ny[j-1], Nx[j-1]), (i, j) ), # Are we nearer to the closest target to the left (right) pixel?
    d( (Ny[j+1], Nx[j+1]), (i, j) ), # Are we nearer to the closest target to the top right (bottom left) pixel?
)

Update Ny[j] and Nx[j] accordingly:

Ny[j], Nx[j] = i, j if P[i, j] = 0
Ny[j], Nx[j] = Ny[j-1], Nx[j-1] if P[i, j] is updated as d( (Ny[j-1], Nx[j-1]), (i, j) )
Ny[j], Nx[j] = Ny[j+1], Nx[j+1] if P[i, j] is updated as d( (Ny[j+1], Nx[j+1]), (i, j) )

We’re looking for an implementation for the distributed version of the Proximity function that works with Dask. Explicit questions are listed as:

Currently, the calculations are performed sequentially, how to parallelize them?
Dividing our 2D input data into a set of smaller chunks, how to compute proximity distance grid chunk by chunk? The primary question is, how to determine the nearest target pixel of all pixels in a chunk when the target pixels can be outside the chunk?

Issue Analytics

State:
Created 2 years ago
Comments:6 (3 by maintainers)

Top GitHub Comments

1reaction

TomAugspurgercommented, Jul 13, 2021

https://nbviewer.jupyter.org/gist/TomAugspurger/78b13fe480c6f427b074f6148ec08637 has a (rough) sketch of what we talked about this morning. It

builds up an array of indices for each point
Computes the closest target / distance using dask_ml.metrics.pairwise_distances_argmin_min
Computes the closest target using an sklearn.neighbors.KDTree + map_blocks on a Dask Array

0reactions

brendancolcommented, Jul 13, 2021

@TomAugspurger very helpful thanks. We will run with these ideas and report back.