Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

griddata linear / LinearNDInterpolator unexpected behavior

See original GitHub issue

I was trying to speed up my code recently, by reducing the data I feed into griddata with method linear or in other words LinearNDInterpolator. I was greatly surprised to find out that this altered my interpolated values significantly.

Reading the docs of LinearNDInterpolator my understanding of this LinearNDInterpolator or linear method of griddata respectivly is the following: The surface is modelled with triangular planes between the points. Interpolation to a certain new coordinate should be only dependent on the three next nearest coordinates and values. Therefore, in principle I should be able to only keep the three nearest data points for each coordinate of interest. Especially it shouldn’t matter if I remove data that is not anywhere near the area of concern.

I wrote a “minimal example” that illustrates the problem well.

I define an outer grid with 10x10 positions and random values between 0 and 1.
From that I copy a smaller inner grid with the exact same values with size 5x5.
I define 50 random positions within this inner grid.
I perform liner interpolation to the coordinates defined in 3. based on maps defined in 1. and 2.
I measure the difference between the two interpolations.

For many points the two interpolations are identical. However, there is always a difference for some positions with significant value like .2 or .5.

import numpy as np
import pandas as pd
from scipy.interpolate import griddata

low, high = -10, 10
low_in, high_in = -2, 2

points_huge = pd.DataFrame(
    [[i, j, np.random.rand()] for i in range(low, high) for j in range(low, high)],
    columns=['x', 'y', 'value']
)

points_in = points_huge[
    (points_huge.x >= low_in) &
    (points_huge.x <= high_in) &
    (points_huge.y >= low_in) &
    (points_huge.y <= high_in)
].copy()

eval_rand = pd.DataFrame(
    [[
        low_in + (high_in - low_in) * np.random.rand() * 0.9,
        low_in + (high_in - low_in) * np.random.rand() * 0.9
    ] for i in range(50)],
    columns=['x', 'y']
)

eval_rand['value_huge'] = griddata(
    points_huge[['x', 'y']],
    points_huge.value,
    eval_rand[['x', 'y']],
    method='linear'
)

eval_rand['value_in'] = griddata(
    points_in[['x', 'y']],
    points_in.value,
    eval_rand[['x', 'y']],
    method='linear'
)

print(max(abs(eval_rand.value_huge-eval_rand.value_in)))

I saw that the inner interpolation is done by QHull and found on the page of QHull that triangulation of non-convex surfaces is not supported. Does that mean my random surface can’t be interpolated correctly? Capture

Issue Analytics

State:
Created 10 months ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

ev-brcommented, Nov 16, 2022

To clarify the “good first issue” label: I think a helpful thing to do would be to add a .. note:: admonition to the Notes section of the griddata docstring, https://scipy.github.io/devdocs/reference/generated/scipy.interpolate.griddata.html#scipy.interpolate.griddata. It currently says For data on a regular grid use [interpn] instead — let’s make this sentence stand out more, so that it is better visible in the docs. Or maybe even a .. warning:: so that it’s red. The same note should then be added to the Notes sections of the griddata implementation classes, NearestND, LinearND and CloughTocher2D interpolators.

However it’s full well possible that I’m missing what is best for a first-time user, so if someone has a better idea, please go for it. Pull requests welcome!

0reactions

8FordPrefect8commented, Nov 14, 2022

Hmm, since I am really no expert neither for the mathematical details nor the numerical details of interpolation I frankly don’t feel confident in writing documentation for a highly used mathematical library.

Top Results From Across the Web

griddata / LinearNDInterpolator incorrect output with large grid ...

When I use griddata to interpolate 2D irregular data on large grid sizes, I get corrupted output for the 'linear' method, but not...

scipy.interpolate.griddata slow due to unnecessary data

I tried to sort out my findings with the differences in interpolation value and found griddata to show unexpected behavior for me.

scipy.interpolate.griddata — SciPy v1.9.3 Manual

linear. tessellate the input point set to N-D simplices, and interpolate linearly on each simplex. See LinearNDInterpolator for more details. cubic (1-D).

How can I perform two-dimensional interpolation using scipy?

I will first demonstrate how the three methods behave under these four tests, ... While griddata and RBFInterpolator seem to produce similar ...

Scipy Griddata Interpolation Results In Lots On Nans - ADocLib

im interpolating some data over a 2D grid but linear interpolation which areas ... scipy.interpolate.interp2d has unexpected behavior with Nan values #4730.