Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

pandas interpolate inconsistent results with axis and method ffill

See original GitHub issue

Problem: I am trying to interpolate a dataframe of mxn with na values. The default axis is 0 (index), so if I were to interpolate at axis 0, df should fill missing values (say at index n) for each column with values from index n-1 and n+1 (or n+p, where n+p is the closest index with a valid value). This holds for default linear method but not for ffill method.

Code: data = np.array([[1,2,3,4, np.nan, 5], [2,4,6,np.nan, 8, 10], [3, 6, 9, np.nan, np.nan, 30]]).T d = pd.DataFrame(data, columns=['A', 'B', 'C']) d A B C 0 1.0 2.0 3.0 1 2.0 4.0 6.0 2 3.0 6.0 9.0 3 4.0 NaN NaN 4 NaN 8.0 NaN 5 5.0 10.0 30.0

d.interpolate(method='ffill', axis=1)

A B C 0 1.0 2.0 3.0 1 2.0 4.0 6.0 2 3.0 6.0 9.0 3 4.0 6.0 9.0 4 4.0 8.0 9.0 5 5.0 10.0 30.0 d.interpolate(method='ffill')

 A     B     C

0 1.0 2.0 3.0 1 2.0 4.0 6.0 2 3.0 6.0 9.0 3 4.0 4.0 4.0 4 NaN 8.0 8.0 5 5.0 10.0 30.0 d.interpolate()

 A     B     C

0 1.0 2.0 3.0 1 2.0 4.0 6.0 2 3.0 6.0 9.0 3 4.0 7.0 16.0 4 4.5 8.0 23.0 5 5.0 10.0 30.0

Issue Analytics

State:
Created 3 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

CloseChoicecommented, May 3, 2020

For explanation: ffill takes the last valid entry along the given axis and fills NaNs.

   A   B   C
0 1.0 2.0 3.0
1 2.0 4.0 6.0
2 3.0 6.0 9.0
3 4.0 NaN NaN
4 NaN 8.0 NaN
5 5.0 10.0 30.0

Given axis=0 means going along the indices and take the last valid entry and therefore the expected result is:

df.interpolate(method='ffill', axis=0)
   A   B   C
0 1.0 2.0 3.0
1 2.0 4.0 6.0
2 3.0 6.0 9.0
3 4.0 4.0 4.0
4 NaN 8.0 8.0
5 5.0 10.0 30.0

Given axis=1 means going along the columns:

df.interpolate(method='ffill', axis=1)
   A   B   C
0 1.0 2.0 3.0
1 2.0 4.0 6.0
2 3.0 6.0 9.0
3 4.0 6.0 9.0
4 4.0 8.0 9.0
5 5.0 10.0 30.0

But for example

df
     A     B     C
3  4.0   NaN   NaN
4  NaN   8.0   NaN
5  5.0  10.0  30.0

df.interpolate(method='ffill', axis=1)
     A     B    C
3  4.0   NaN   NaN
4  4.0   8.0   NaN
5  5.0  10.0  30.0

0reactions

saarahrasheedcommented, May 7, 2020

Hi. Thanks for the prompt responses and explanation. I believe you’re right. Sorry the confusion was at my end! Thanks again.

Top Results From Across the Web

DataFrame.interpolate() extrapolates over trailing missing data

Is there way to instruct pandas to not extrapolate past the last non-missing value in a series? EDIT: I'd still love to see...

Interpolation using pandas - Numpy Ninja

Interpolation is one of the methods of filling null values. Before learning about interpolation, let us learn why do we need interpolation.

pandas.DataFrame.interpolate — pandas 1.5.2 documentation

Returns the same object type as the caller, interpolated at some or all NaN values or None if inplace=True . See also. fillna....

Interpolation | Interpolation in Python to Fill Missing Values

The linear method ignores the index and treats missing values as equally spaced and finds the best point to fit the missing value...

How Interpolate Function works in Pandas? - eduCBA

It utilizes different interjection procedure to fill the missing qualities ... Pandas.interpolate(axis=0, method='linear', inplace=False, limit=None, ...