Enhancement Request: control extrapolation on .interpolate
See original GitHub issueCode Sample, a copy-pastable example if possible
import pandas as pd
import numpy as np
dfMain = pd.DataFrame({
'a': [0, 1, np.NAN, 3, 4],
'b': [np.NaN, np.NaN, np.NaN, 3, 4],
'c': [0 , 1, 2, 3, np.NaN]})
for col in dfMain:
start = dfMain[col].first_valid_index()
end = dfMain[col].last_valid_index()
dfMain.loc[start:end, col] = dfMain.loc[start:end, col].interpolate()
print(dfMain)
Problem description
It would be very nice to have a limit_direction=‘inside’ that would make interpolate only fill values that are surrounded (both in front and behind) with valid values.
This would allow an interpolate to only fill missing values in a series and not extend the series beyond its original limits. The key here is that it is sometimes important to maintain the original range of a series, but still fill in the gaps.
The example shows a simple DataFrame with an ‘inside’ interpolation.
Expected Output
a b c
0 0.0 NaN 0.0
1 1.0 NaN 1.0
2 2.0 NaN 2.0
3 3.0 3.0 3.0
4 4.0 4.0 NaN
Output of pd.show_versions()
pandas: 0.19.2 nose: 1.3.7 pip: 9.0.1 setuptools: 34.4.1 Cython: 0.25.2 numpy: 1.12.1 scipy: 0.18.1 statsmodels: 0.6.1 xarray: None IPython: 5.1.0 sphinx: 1.5.1 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2016.10 blosc: None bottleneck: 1.2.0 tables: 3.3.0 numexpr: 2.6.1 matplotlib: 2.0.0 openpyxl: 2.4.1 xlrd: 1.0.0 xlwt: 1.2.0 xlsxwriter: 0.9.6 lxml: 3.7.2 bs4: 4.5.3 html5lib: None httplib2: None apiclient: None sqlalchemy: 1.1.5 pymysql: None psycopg2: None jinja2: 2.9.4 boto: 2.45.0 pandas_datareader: 0.2.1 None
Issue Analytics
- State:
- Created 6 years ago
- Comments:15 (14 by maintainers)
Top GitHub Comments
So, this kind of already works when you use the
scipy
methods, since that’s the default for scipy when you extraploateThis is an implementation detail that the user shouldn’t need to worry about… But I’m not sure that we can make this consistent across methods in a backwards-compatible way.
@naifrec thanks for the detailed example, I think I understand the behavior you’re looking for.
limit
currently has the clearly defined behavior of “fill at most this many NaNs in a row”, which is useful so we can’t change that. We’ll have to add another keyword to interpolate.I think we should add an additional option to
limit_direction
likeconsecutive
(there’s probably a better word. Something that describes “all or nothing”).Could you open up a new issue for this (you can just copy your last message). This issue is focusing on extrapolation (which would be orthogonal to this issue).