question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

pandas.Dataframe.interpolate() does not extrapolate even if it is asked to, depending on interpolation method

See original GitHub issue

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np

a = pd.Series([0, 1, np.nan, 3, 4, np.nan, np.nan, np.nan, np.nan])
a_int=a.interpolate(method='cubic', limit_area=None)

Problem description

Some of the offered methods (it seems all of them that are provided by interp1d) are unable to extrapolate over np.nan. However, the limit_area switch for df.interpolate() indicates you can force extrapolation. A combination of limit_area=None and an incompatible method should raise a warning.

There used to be a similar issue where extrapolation over trailing NaN was done unintentionally, so maybe the fix for that overdid it. https://github.com/pandas-dev/pandas/issues/8000

Expected Output

Extrapolation over the NaNs in the array is expected. Using a different method, such as pchip achieves this.

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line] INSTALLED VERSIONS

commit : None python : 3.7.2.final.0 python-bits : 64 OS : Windows OS-release : 10 machine : AMD64 processor : Intel64 Family 6 Model 63 Stepping 2, GenuineIntel byteorder : little LC_ALL : None LANG : en LOCALE : None.None

pandas : 0.25.3 (also tested with 1.0.0) numpy : 1.15.4 pytz : 2018.9 dateutil : 2.7.5 pip : 20.0.2 setuptools : 41.0.1 Cython : 0.29.15 pytest : None hypothesis : None sphinx : 1.8.3 blosc : None feather : None xlsxwriter : None lxml.etree : 4.3.3 html5lib : None pymysql : None psycopg2 : None jinja2 : 2.10 IPython : 7.5.0 pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : None lxml.etree : 4.3.3 matplotlib : 3.0.3 numexpr : None odfpy : None openpyxl : 2.5.12 pandas_gbq : None pyarrow : None pytables : None s3fs : None scipy : 1.2.1 sqlalchemy : None tables : None xarray : None xlrd : 1.2.0 xlwt : None xlsxwriter : None

Issue Analytics

  • State:open
  • Created 4 years ago
  • Reactions:9
  • Comments:10 (2 by maintainers)

github_iconTop GitHub Comments

10reactions
fercookcommented, Apr 19, 2020

I second this.

Also, even when it works, it doesn’t. The implied meaning of “extrapolate” is that it will continue on the last available trend. However, the observed result is that the last value is repeated.

In:

a = pd.Series([0, 1, np.nan, 3, 4, np.nan, np.nan, np.nan, np.nan])
a.interpolate(method='linear', limit_area=None)

Out:

0    0.0
1    1.0
2    2.0
3    3.0
4    4.0
5    4.0
6    4.0
7    4.0
8    4.0
5reactions
zhihua-zhengcommented, Oct 29, 2021

@khaeru @lyndonchan To extrapolate in both directions, use limit_direction="both", which is not obvious at all.

import numpy as np
import pandas as pd

# A 1-D Series with missing external values
x = [0.5, 1, 2, 3, 20]
y = [np.NaN, 1, 4, 9, np.NaN]
s = pd.Series(y, index=x)

# Expected usage
kw = dict(method="quadratic", fill_value="extrapolate", limit_direction="both")
s.interpolate(**kw)

This gives:

0.5       0.25
1.0       1.00
2.0       4.00
3.0       9.00
20.0    400.00
dtype: float64
Read more comments on GitHub >

github_iconTop Results From Across the Web

DataFrame.interpolate() extrapolates over trailing missing data
DataFrame.interpolate() extrapolates over trailing missing data ; a, dec=None): """ :param a: a 1d array to be interpolated :param dec: the number of...
Read more >
pandas.DataFrame.interpolate — pandas 1.5.2 documentation
Interpolation technique to use. One of: 'linear': Ignore the index and treat the values as equally spaced. This is the only method supported...
Read more >
pandas: Interpolate NaN with interpolate() - nkmk note
If the index column is strings, method='linear' (default) is fine, but If method='index' or method='values' , an error is raised.
Read more >
Interpolation | Interpolation in Python to Fill Missing Values
Interpolation is a technique used to estimate unknown data points between two known data points i.e to impute missing values in the ...
Read more >
How to Interpolate Time Series Data in Python Pandas
Remember that it is crucial to choose the adequate interpolation method for each task. Special considerations are required particularly for forecasting tasks, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found