ENH: Solving two-step interpolation in scipy
See original GitHub issueIs your feature request related to a problem? Please describe.
This is being cross-posted here from SO.
This problem is a little complicated, so I will try to make it as simple as possible.
Say that I have a table of values for which I want to perform two-step 1d interpolation; i.e., interpolation of a 2d value via interpolating in 1d twice. Note that this is a different problem from that being solved by scipy.interpolate.interp2d
, and the appropriate algorithm is going to be quite different.
Here is an example of source data for performing the interpolation. The column labels are the dependent variable, Z. The row labels are the SECOND independent variable, Y. The 2d, interior portion of the table is the FIRST independent variable, X.
Z 1 2 3
Y
1 1 2 3 <-- X row 1
2 3 4 5 <-- X row 2
We want to produce a factory function. The factory produces another function which, given the above source data, performs appropriate twice-interpolation of any x,y
pair.
Describe the solution you’d like.
Here I will call the factory function twice_interp1d_with_2d_x
, and it will look like this;
def twice_interp1d_with_2d_x(x, y, z):
"""Produce an interpolating function of the form:
f(x,y)
The function interpolates an x and y value to a z value.
Expects a 2d x argument. The y-axis is expected to be the rows; z is the columns."""
...
Here is the same function with type information attached (all variables are array_like; x is expected to be 2d, y and z are 1d):
def twice_interp1d_with_2d_x(x: array_like, y: array_like, z: array_like) -> Callable[[array_like, array_like], array_like]: ...
Describe alternatives you’ve considered.
Using the existing scipy.interpolate.interp1d
function (which includes additional bounds_error
and fill_value
arguments), I have already written this function (but it is broken; see below), and the basic solution works like this:
from scipy.interpolate import interp1d
def twice_interp1d_with_2d_x(x, y, z, bounds_error=True, fill_value=None):
interp1d_rows = [interp1d(row, z, bounds_error=bounds_error, fill_value=fill_value) for row in x]
def interpolator(x_inner, y_inner):
temp_z = [f(x_inner) for f in interp1d_rows]
return interp1d(y, temp_z, bounds_error=bounds_error, fill_value=fill_value)(y_inner)
return interpolator
This works for many situations:
>>> twice_interp1d_with_2d_x(X, Y, Z)(3, 2)
array(1.)
Side note: interp1d
always produces numpy arrays; this result is a 0d (ie, atomic) array
Additional context (e.g. screenshots, GIFs)
But there is a big problem. Consider the case of (x=4, y=2)
. The solution to this case should be:
>>> twice_interp1d_with_2d_x(X, Y, Z)(4, 2)
array(2.)
But instead, we get a ValueError
because the value of 4 is beyond the upper bounds of the first row in X. We can try to fix this by passing bounds_error=False
, but then we just get a nan
:
>>> twice_interp1d_with_2d_x(X, Y, Z, bounds_error=False)(4, 2)
array(nan)
To make matters even more complicated, sometimes an array will be passed as the first argument or second argument, or BOTH arguments, and we might have to deal with np.nan
for some combinations, but not others:
>>> twice_interp1d_with_2d_x(X, Y, Z, bounds_error=False)([3, 4], 2)
array([nan, 2.])
The question is: is there a simple, concise way to produce a correct version of this function using modern idiomatic python and numpy that I am overlooking? I know I could solve this using line after line of logic-- but it is going to be hours of work. Is this solvable in some quick, elegant way that I am not thinking of?
If not, it seems to me something like this would be a good candidate to be added to the scipy.interpolate
module. Note that I have limited the problem description to the case of a table with a 2d x value; however, it could just as easily be a 2d y or 2d z value (but the case of the z value is simple: no boundary error problem exists).
Issue Analytics
- State:
- Created 2 years ago
- Comments:16 (7 by maintainers)
OK, at this stage ISTM that the original query has been responded to (even if not in a fully complete way), a possibly full solution can be cooked up using tools scipy.interpolate provides already, and there does not seem to be much to track for a possible enhancement. Closing, but do feel free to keep discussing.
It seems to me that what you are trying to achieve, at least in the way that you described it, is similar to an inverse problem for
interp2d
. Usinginterp2d
, the logical approach would be to find an interpolated valuex
withinX
giveny
andz
coordinates that belong, respectively, toY
andZ
. However, here, it looks like you already havex
, and you also know at whichy
it was sampled. Thereby, you are asking the question: giveny
, at which valuez
ofZ
should I interpolateX
to obtainx
? At least, that is the way I see it from the summary you gave. If this is correct, I am tempted to say this is a fair question, but I am not sure if it is in the current scope of SciPy. One limitation though is that it seems you can have repeated values in Z, which is problematic.Regarding your issue with
nan
values, I have suggested a solution on StackOverflow, which I am cross-posting here to illustrate that it seems to work for the test case you posted above.