griddata linear / LinearNDInterpolator unexpected behavior
See original GitHub issueI was trying to speed up my code recently, by reducing the data I feed into griddata with method linear or in other words LinearNDInterpolator. I was greatly surprised to find out that this altered my interpolated values significantly.
Reading the docs of LinearNDInterpolator my understanding of this LinearNDInterpolator or linear method of griddata respectivly is the following: The surface is modelled with triangular planes between the points. Interpolation to a certain new coordinate should be only dependent on the three next nearest coordinates and values. Therefore, in principle I should be able to only keep the three nearest data points for each coordinate of interest. Especially it shouldn’t matter if I remove data that is not anywhere near the area of concern.
I wrote a “minimal example” that illustrates the problem well.
- I define an outer grid with 10x10 positions and random values between 0 and 1.
- From that I copy a smaller inner grid with the exact same values with size 5x5.
- I define 50 random positions within this inner grid.
- I perform liner interpolation to the coordinates defined in 3. based on maps defined in 1. and 2.
- I measure the difference between the two interpolations.
For many points the two interpolations are identical. However, there is always a difference for some positions with significant value like .2 or .5.
import numpy as np
import pandas as pd
from scipy.interpolate import griddata
low, high = -10, 10
low_in, high_in = -2, 2
points_huge = pd.DataFrame(
[[i, j, np.random.rand()] for i in range(low, high) for j in range(low, high)],
columns=['x', 'y', 'value']
)
points_in = points_huge[
(points_huge.x >= low_in) &
(points_huge.x <= high_in) &
(points_huge.y >= low_in) &
(points_huge.y <= high_in)
].copy()
eval_rand = pd.DataFrame(
[[
low_in + (high_in - low_in) * np.random.rand() * 0.9,
low_in + (high_in - low_in) * np.random.rand() * 0.9
] for i in range(50)],
columns=['x', 'y']
)
eval_rand['value_huge'] = griddata(
points_huge[['x', 'y']],
points_huge.value,
eval_rand[['x', 'y']],
method='linear'
)
eval_rand['value_in'] = griddata(
points_in[['x', 'y']],
points_in.value,
eval_rand[['x', 'y']],
method='linear'
)
print(max(abs(eval_rand.value_huge-eval_rand.value_in)))
I saw that the inner interpolation is done by QHull and found on the page of QHull that triangulation of non-convex surfaces is not supported. Does that mean my random surface can’t be interpolated correctly?
Issue Analytics
- State:
- Created 10 months ago
- Comments:5 (3 by maintainers)
To clarify the “good first issue” label: I think a helpful thing to do would be to add a
.. note::
admonition to the Notes section of thegriddata
docstring, https://scipy.github.io/devdocs/reference/generated/scipy.interpolate.griddata.html#scipy.interpolate.griddata. It currently says For data on a regular grid use [interpn] instead — let’s make this sentence stand out more, so that it is better visible in the docs. Or maybe even a.. warning::
so that it’s red. The same note should then be added to the Notes sections of the griddata implementation classes,NearestND
,LinearND
andCloughTocher2D
interpolators.However it’s full well possible that I’m missing what is best for a first-time user, so if someone has a better idea, please go for it. Pull requests welcome!
Hmm, since I am really no expert neither for the mathematical details nor the numerical details of interpolation I frankly don’t feel confident in writing documentation for a highly used mathematical library.