Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug/Doc] lowess returns nan and does not warn if there are too few neighbors

See original GitHub issue

Statsmodels version: 0.6.1 Issue: when the # of unique values in X is less than the number of unique neighbors needed to compute LOWESS, nans are returned but no error or warning is given. Expected behavior: a warning should be raised that LOWESS cannot be computed due to lack of sufficient unique neighbors.

(Workaround: you could jitter all the x values by some tiny epsilon to create more unique values, and then rerun lowess(y, x).)

import statsmodels as sm
print sm.__version__ # '0.6.1'
from sm.nonparametric.smoothers_lowess import lowess
import numpy as np

# initializing data
x = np.random.choice(range(0, 3), 100) # only 3 unique values: {0, 1, 2}
y = np.random.choice(np.arange(0, 1, 0.1), 100)

preds = lowess(y, x)[:, 1] # Slicing to get the predicted y values
# no warning or exception raised, despite the fact that all predictions are nan
print preds
>>> array([ nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan])

# Trying again, with a few more unique x values
x = np.random.choice(range(0, 5), 100) # now, with 5 unique values
y = np.random.choice(np.arange(0, 1, 0.1), 100)
print preds
>>> array([ 0.4558193 ,  0.4558193 ,  0.4558193 ,  0.4558193 ,  0.4558193 ,
        0.4558193 ,  0.4558193 ,  0.4558193 ,  0.4558193 ,  0.4558193 ,
        0.4558193 ,  0.4558193 ,  0.4558193 ,  0.4558193 ,  0.4558193 ,
        0.4558193 ,  0.4558193 ,  0.4558193 ,  0.4558193 ,  0.4558193 ,
        0.4558193 ,  0.4558193 ,  0.42940657,  0.42940657,  0.42940657,
        0.42940657,  0.42940657,  0.42940657,  0.42940657,  0.42940657,
        0.42940657,  0.42940657,  0.42940657,  0.42940657,  0.42940657,
        0.42940657,  0.42940657,  0.42940657,  0.42940657,  0.42940657,
        0.42940657,  0.42940657,  0.42940657,  0.40486366,  0.40486366,
        0.40486366,  0.40486366,  0.40486366,  0.40486366,  0.40486366,
        0.40486366,  0.40486366,  0.40486366,  0.40486366,  0.40486366,
        0.40486366,  0.40486366,  0.40486366,  0.40486366,  0.40486366,
        0.40486366,  0.40486366,  0.40486366,  0.41932243,  0.41932243,
        0.41932243,  0.41932243,  0.41932243,  0.41932243,  0.41932243,
        0.41932243,  0.41932243,  0.41932243,  0.41932243,  0.41932243,
        0.41932243,  0.41932243,  0.41932243,  0.41932243,  0.41932243,
        0.41932243,  0.41932243,  0.41932243,  0.48931377,  0.48931377,
        0.48931377,  0.48931377,  0.48931377,  0.48931377,  0.48931377,
        0.48931377,  0.48931377,  0.48931377,  0.48931377,  0.48931377,
        0.48931377,  0.48931377,  0.48931377,  0.48931377,  0.48931377])

Issue Analytics

State:
Created 8 years ago
Comments:18 (10 by maintainers)

Top GitHub Comments

1reaction

josef-pktcommented, May 21, 2021

AFAIU: p_i_j is the projection matrix x (x'x)^{-1} x' specialized to the case with a single regressor and constant, with the addition for weights as in WLS.

(for example: scipy linregress is/was doing something similar working directly with summ of squares or cross products)

(I haven’t looked at those special cases in a long time, and didn’t check the details here. the ols version would be
y_hat = x @ pinv(x) @ y)

0reactions

kaktus42commented, May 21, 2021

So, I am trying to understand the implementation and I agree, that there seems something wrong in update_neighborhood. I’m not certain about what, though.

In the meanwhile - is there someone who could give me a hint about how calculate_y_fit is actually doing it? The not just says:

    No regression function (e.g. lstsq) is called. Instead "projection
    vector" p_i_j is calculated, and y_fit[i] = sum(p_i_j * y[j]) = y_fit[i]
    for j s.t. x[j] is in the neighborhood of xval. p_i_j is a function of
    the weights, xval, and its neighbors.

Much appreciated if someone can hint me on why this code is calculating the least squares:

        for j in range(left_end, right_end):
            sum_weighted_x += weights[j] * x[j]
        for j in range(left_end, right_end):
            weighted_sqdev_x += weights[j] * (x[j] - sum_weighted_x) ** 2
        for j in range(left_end, right_end):
            p_i_j = weights[j] * (1.0 + (xval - sum_weighted_x) *
                             (x[j] - sum_weighted_x) / weighted_sqdev_x)
            y_fit[i] += p_i_j * y[j]