question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add distributed proximity distance grids using Dask

See original GitHub issue

Given a 2D raster image, a set of target pixels in the input raster, the proximity function computes a raster of proximity to indicate the distance from each pixel to the nearest target pixel. The distance metric to be used can be one of the following: Euclidean, Great-Circle, or Manhattan. In the most naive way, we can calculate distances from each pixel to all target pixels and find the closest target. This would take m*n*t calculations, where m*n is the size of the raster, and t is the number of targets.

The current implementation of xrspatial.proximity function is ported from GDAL with some modifications to make it work with xarray.DataArray. This notebook shows how to use the function. To keep it simple, let’s consider the problem at a 2D array level instead of a 2D xarray DataArray. The algorithm can be described as follows:

Inputs:

  • Raster image I, a height x width 2D array
  • Set of target pixels T. All target pixels are in I
  • Distance metric d()

Output: Proximity raster P where P[i, j] is the distance from cell (i, j) to its nearest target pixel

Idea: Use dynamic programming to identify the nearest target pixel of a pixel based on the nearest target pixels of its 3x3 neighborhood window.

Detail implementation:

  • Let Nx be a 1d array of width elements: Nx[j] is the x-position in pixel space of nearest target pixel of pixel (i, j). Nx[j] can have a value in [0, width-1]
  • Let Ny be a 1d array of width elements: Ny[j] is the y-position in pixel space of nearest target pixel of pixel (i, j). Ny[j] can have a value in [0, height-1]. Values of Nx and Ny will be updated row by row.
  1. Initially,
  • set Nx[j] = -1 for all j.
  • set Ny[j] = -1 for all j.
  • for all (i, j), set P[i, j] = 0 if cell (i, j) is a target pixel, P[i, j] = infinity otherwise.
  • note that distance d( (i1, j1), (i2, j2) ) = infinity if any of the inputs i1, j1, i2, j2 equals to -1 (i.e, invalid pixel)
  1. Traverse the image row by row from top to bottom:
  • traverse each row from left to right
  • traverse each row from right to left
  1. Reset Nx[j] = -1 and Ny[j] = -1 for all j.
  2. Traverse the image in reverse order from bottom to top:
  • traverse each row from right to left
  • traverse each row from left to right

The formula to update P[i, j] and Ny[j] and Nx[j] at each cell (i, j)

P[i, j] = min(
    P[i, j],                   
    d( (Ny[j], Nx[j]), (i, j) ),     # Are we nearer to the closest target to the above (below) pixel?
    d( (Ny[j-1], Nx[j-1]), (i, j) ), # Are we nearer to the closest target to the left (right) pixel?
    d( (Ny[j+1], Nx[j+1]), (i, j) ), # Are we nearer to the closest target to the top right (bottom left) pixel?
)

Update Ny[j] and Nx[j] accordingly:

  • Ny[j], Nx[j] = i, j if P[i, j] = 0
  • Ny[j], Nx[j] = Ny[j-1], Nx[j-1] if P[i, j] is updated as d( (Ny[j-1], Nx[j-1]), (i, j) )
  • Ny[j], Nx[j] = Ny[j+1], Nx[j+1] if P[i, j] is updated as d( (Ny[j+1], Nx[j+1]), (i, j) )

We’re looking for an implementation for the distributed version of the Proximity function that works with Dask. Explicit questions are listed as:

  1. Currently, the calculations are performed sequentially, how to parallelize them?
  2. Dividing our 2D input data into a set of smaller chunks, how to compute proximity distance grid chunk by chunk? The primary question is, how to determine the nearest target pixel of all pixels in a chunk when the target pixels can be outside the chunk?

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
TomAugspurgercommented, Jul 13, 2021

https://nbviewer.jupyter.org/gist/TomAugspurger/78b13fe480c6f427b074f6148ec08637 has a (rough) sketch of what we talked about this morning. It

  • builds up an array of indices for each point
  • Computes the closest target / distance using dask_ml.metrics.pairwise_distances_argmin_min
  • Computes the closest target using an sklearn.neighbors.KDTree + map_blocks on a Dask Array
0reactions
brendancolcommented, Jul 13, 2021

@TomAugspurger very helpful thanks. We will run with these ideas and report back.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Distributed - spread your data and computation across a cluster
Creating a cluster object will create a Dask scheduler and a number of Dask workers. If no arguments are specified then it will...
Read more >
Distributed Data Pre-processing using Dask, Amazon ECS ...
The Grid Search technique is an exhaustive searching bymanually specifying a subset of the hyperparameter space of a learning algorithm. The ...
Read more >
Machine Learning in Python: Main Developments and ... - MDPI
Dask -ML provides distributed versions of a subset of Scikit-learn's classical ML algorithms with a Scikit-learn compatible API.
Read more >
Influence of Multiple Types of Proximity on the Degree of ...
In this study, we developed a novel gap detection task and tested ... For example, crowding weakens as target-flanker distance enlarges ...
Read more >
Tutorial Task 1.4 - Add A* Navigation - FlexSim
In this step, you'll display the travel threshold for several object in the model and adjust their position accordingly. The Show Travel Threshold...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found