question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Enhancement Request: control extrapolation on .interpolate

See original GitHub issue

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np

dfMain = pd.DataFrame({
    'a': [0, 1, np.NAN, 3, 4],
    'b': [np.NaN, np.NaN, np.NaN, 3, 4],
    'c': [0 , 1, 2, 3, np.NaN]})

for col in dfMain:
    start = dfMain[col].first_valid_index()
    end = dfMain[col].last_valid_index()
    dfMain.loc[start:end, col] = dfMain.loc[start:end, col].interpolate()

print(dfMain)

Problem description

It would be very nice to have a limit_direction=‘inside’ that would make interpolate only fill values that are surrounded (both in front and behind) with valid values.

This would allow an interpolate to only fill missing values in a series and not extend the series beyond its original limits. The key here is that it is sometimes important to maintain the original range of a series, but still fill in the gaps.

The example shows a simple DataFrame with an ‘inside’ interpolation.

Expected Output

     a    b    c
0  0.0  NaN  0.0
1  1.0  NaN  1.0
2  2.0  NaN  2.0
3  3.0  3.0  3.0
4  4.0  4.0  NaN

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.0.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-75-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.19.2 nose: 1.3.7 pip: 9.0.1 setuptools: 34.4.1 Cython: 0.25.2 numpy: 1.12.1 scipy: 0.18.1 statsmodels: 0.6.1 xarray: None IPython: 5.1.0 sphinx: 1.5.1 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2016.10 blosc: None bottleneck: 1.2.0 tables: 3.3.0 numexpr: 2.6.1 matplotlib: 2.0.0 openpyxl: 2.4.1 xlrd: 1.0.0 xlwt: 1.2.0 xlsxwriter: 0.9.6 lxml: 3.7.2 bs4: 4.5.3 html5lib: None httplib2: None apiclient: None sqlalchemy: 1.1.5 pymysql: None psycopg2: None jinja2: 2.9.4 boto: 2.45.0 pandas_datareader: 0.2.1 None

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:15 (14 by maintainers)

github_iconTop GitHub Comments

4reactions
TomAugspurgercommented, May 8, 2017

So, this kind of already works when you use the scipy methods, since that’s the default for scipy when you extraploate

In [31]: dfMain.interpolate(method='slinear')
Out[31]:
     a    b    c
0  0.0  NaN  0.0
1  1.0  NaN  1.0
2  2.0  NaN  2.0
3  3.0  3.0  3.0
4  4.0  4.0  NaN

This is an implementation detail that the user shouldn’t need to worry about… But I’m not sure that we can make this consistent across methods in a backwards-compatible way.

1reaction
TomAugspurgercommented, May 23, 2017

@naifrec thanks for the detailed example, I think I understand the behavior you’re looking for.

limit currently has the clearly defined behavior of “fill at most this many NaNs in a row”, which is useful so we can’t change that. We’ll have to add another keyword to interpolate.

I think we should add an additional option to limit_direction like consecutive (there’s probably a better word. Something that describes “all or nothing”).

Could you open up a new issue for this (you can just copy your last message). This issue is focusing on extrapolation (which would be orthogonal to this issue).

Read more comments on GitHub >

github_iconTop Results From Across the Web

What are extrapolation and interpolation? - TechTarget
To extrapolate is to infer something not explicitly stated from existing information. Interpolation is the act of estimating a value within two known...
Read more >
A Guide to Interpolation vs. Extrapolation (Plus Examples)
Interpolation and extrapolation are techniques you can use to estimate values from existing data. You can apply the two methods to analyse ...
Read more >
Solved: Automatic Interpolation/Extrapolation Techinque?
Hello,. I'm trying to Alteryx(inteded to be used as a verb) a data set that automatically linear extrapolates and interpolates values.
Read more >
About Point Interpolation/Extrapolation Output for Surface ...
Use the Centroids output location to interpolate surface points (NNI and Kriging) at the existing surface triangle centroids within specified ...
Read more >
Interpolation and Extrapolation - Help center - NumXL
In the case where X has duplicate values, INTERPOLATE will replace those duplicate values with a single entry, setting the corresponding y-value ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found