question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug/Doc] lowess returns nan and does not warn if there are too few neighbors

See original GitHub issue

Statsmodels version: 0.6.1 Issue: when the # of unique values in X is less than the number of unique neighbors needed to compute LOWESS, nans are returned but no error or warning is given. Expected behavior: a warning should be raised that LOWESS cannot be computed due to lack of sufficient unique neighbors.

(Workaround: you could jitter all the x values by some tiny epsilon to create more unique values, and then rerun lowess(y, x).)

import statsmodels as sm
print sm.__version__ # '0.6.1'
from sm.nonparametric.smoothers_lowess import lowess
import numpy as np

# initializing data
x = np.random.choice(range(0, 3), 100) # only 3 unique values: {0, 1, 2}
y = np.random.choice(np.arange(0, 1, 0.1), 100)

preds = lowess(y, x)[:, 1] # Slicing to get the predicted y values
# no warning or exception raised, despite the fact that all predictions are nan
print preds
>>> array([ nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan])

# Trying again, with a few more unique x values
x = np.random.choice(range(0, 5), 100) # now, with 5 unique values
y = np.random.choice(np.arange(0, 1, 0.1), 100)
print preds
>>> array([ 0.4558193 ,  0.4558193 ,  0.4558193 ,  0.4558193 ,  0.4558193 ,
        0.4558193 ,  0.4558193 ,  0.4558193 ,  0.4558193 ,  0.4558193 ,
        0.4558193 ,  0.4558193 ,  0.4558193 ,  0.4558193 ,  0.4558193 ,
        0.4558193 ,  0.4558193 ,  0.4558193 ,  0.4558193 ,  0.4558193 ,
        0.4558193 ,  0.4558193 ,  0.42940657,  0.42940657,  0.42940657,
        0.42940657,  0.42940657,  0.42940657,  0.42940657,  0.42940657,
        0.42940657,  0.42940657,  0.42940657,  0.42940657,  0.42940657,
        0.42940657,  0.42940657,  0.42940657,  0.42940657,  0.42940657,
        0.42940657,  0.42940657,  0.42940657,  0.40486366,  0.40486366,
        0.40486366,  0.40486366,  0.40486366,  0.40486366,  0.40486366,
        0.40486366,  0.40486366,  0.40486366,  0.40486366,  0.40486366,
        0.40486366,  0.40486366,  0.40486366,  0.40486366,  0.40486366,
        0.40486366,  0.40486366,  0.40486366,  0.41932243,  0.41932243,
        0.41932243,  0.41932243,  0.41932243,  0.41932243,  0.41932243,
        0.41932243,  0.41932243,  0.41932243,  0.41932243,  0.41932243,
        0.41932243,  0.41932243,  0.41932243,  0.41932243,  0.41932243,
        0.41932243,  0.41932243,  0.41932243,  0.48931377,  0.48931377,
        0.48931377,  0.48931377,  0.48931377,  0.48931377,  0.48931377,
        0.48931377,  0.48931377,  0.48931377,  0.48931377,  0.48931377,
        0.48931377,  0.48931377,  0.48931377,  0.48931377,  0.48931377])

Issue Analytics

  • State:open
  • Created 8 years ago
  • Comments:18 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
josef-pktcommented, May 21, 2021

AFAIU: p_i_j is the projection matrix x (x'x)^{-1} x' specialized to the case with a single regressor and constant, with the addition for weights as in WLS.

(for example: scipy linregress is/was doing something similar working directly with summ of squares or cross products)

(I haven’t looked at those special cases in a long time, and didn’t check the details here. the ols version would be
y_hat = x @ pinv(x) @ y)

0reactions
kaktus42commented, May 21, 2021

So, I am trying to understand the implementation and I agree, that there seems something wrong in update_neighborhood. I’m not certain about what, though.

In the meanwhile - is there someone who could give me a hint about how calculate_y_fit is actually doing it? The not just says:

    No regression function (e.g. lstsq) is called. Instead "projection
    vector" p_i_j is calculated, and y_fit[i] = sum(p_i_j * y[j]) = y_fit[i]
    for j s.t. x[j] is in the neighborhood of xval. p_i_j is a function of
    the weights, xval, and its neighbors.

Much appreciated if someone can hint me on why this code is calculating the least squares:

        for j in range(left_end, right_end):
            sum_weighted_x += weights[j] * x[j]
        for j in range(left_end, right_end):
            weighted_sqdev_x += weights[j] * (x[j] - sum_weighted_x) ** 2
        for j in range(left_end, right_end):
            p_i_j = weights[j] * (1.0 + (xval - sum_weighted_x) *
                             (x[j] - sum_weighted_x) / weighted_sqdev_x)
            y_fit[i] += p_i_j * y[j]
Read more comments on GitHub >

github_iconTop Results From Across the Web

Bug: lowess with a few discrete x values returns nan
I won't try to figure out the code right now, but a quick look shows that we divide by zero if there is...
Read more >
statsmodels.nonparametric.smoothers_lowess.lowess
If 'none', no nan checking is done. If 'drop', any observations with nans are dropped. If 'raise', an error is raised. Default is...
Read more >
Lowess method in statsmodels converts real values to NaN ...
However the ouput of the method converts all the values of the 'Mbp' column to 'nan' . I cannot see the reason why....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found