[Bug/Doc] lowess returns nan and does not warn if there are too few neighbors
See original GitHub issueStatsmodels version: 0.6.1 Issue: when the # of unique values in X is less than the number of unique neighbors needed to compute LOWESS, nans are returned but no error or warning is given. Expected behavior: a warning should be raised that LOWESS cannot be computed due to lack of sufficient unique neighbors.
(Workaround: you could jitter all the x values by some tiny epsilon to create more unique values, and then rerun lowess(y, x).)
import statsmodels as sm
print sm.__version__ # '0.6.1'
from sm.nonparametric.smoothers_lowess import lowess
import numpy as np
# initializing data
x = np.random.choice(range(0, 3), 100) # only 3 unique values: {0, 1, 2}
y = np.random.choice(np.arange(0, 1, 0.1), 100)
preds = lowess(y, x)[:, 1] # Slicing to get the predicted y values
# no warning or exception raised, despite the fact that all predictions are nan
print preds
>>> array([ nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan])
# Trying again, with a few more unique x values
x = np.random.choice(range(0, 5), 100) # now, with 5 unique values
y = np.random.choice(np.arange(0, 1, 0.1), 100)
print preds
>>> array([ 0.4558193 , 0.4558193 , 0.4558193 , 0.4558193 , 0.4558193 ,
0.4558193 , 0.4558193 , 0.4558193 , 0.4558193 , 0.4558193 ,
0.4558193 , 0.4558193 , 0.4558193 , 0.4558193 , 0.4558193 ,
0.4558193 , 0.4558193 , 0.4558193 , 0.4558193 , 0.4558193 ,
0.4558193 , 0.4558193 , 0.42940657, 0.42940657, 0.42940657,
0.42940657, 0.42940657, 0.42940657, 0.42940657, 0.42940657,
0.42940657, 0.42940657, 0.42940657, 0.42940657, 0.42940657,
0.42940657, 0.42940657, 0.42940657, 0.42940657, 0.42940657,
0.42940657, 0.42940657, 0.42940657, 0.40486366, 0.40486366,
0.40486366, 0.40486366, 0.40486366, 0.40486366, 0.40486366,
0.40486366, 0.40486366, 0.40486366, 0.40486366, 0.40486366,
0.40486366, 0.40486366, 0.40486366, 0.40486366, 0.40486366,
0.40486366, 0.40486366, 0.40486366, 0.41932243, 0.41932243,
0.41932243, 0.41932243, 0.41932243, 0.41932243, 0.41932243,
0.41932243, 0.41932243, 0.41932243, 0.41932243, 0.41932243,
0.41932243, 0.41932243, 0.41932243, 0.41932243, 0.41932243,
0.41932243, 0.41932243, 0.41932243, 0.48931377, 0.48931377,
0.48931377, 0.48931377, 0.48931377, 0.48931377, 0.48931377,
0.48931377, 0.48931377, 0.48931377, 0.48931377, 0.48931377,
0.48931377, 0.48931377, 0.48931377, 0.48931377, 0.48931377])
Issue Analytics
- State:
- Created 8 years ago
- Comments:18 (10 by maintainers)
Top Results From Across the Web
Bug: lowess with a few discrete x values returns nan
I won't try to figure out the code right now, but a quick look shows that we divide by zero if there is...
Read more >statsmodels.nonparametric.smoothers_lowess.lowess
If 'none', no nan checking is done. If 'drop', any observations with nans are dropped. If 'raise', an error is raised. Default is...
Read more >Lowess method in statsmodels converts real values to NaN ...
However the ouput of the method converts all the values of the 'Mbp' column to 'nan' . I cannot see the reason why....
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
AFAIU: p_i_j is the projection matrix
x (x'x)^{-1} x'
specialized to the case with a single regressor and constant, with the addition for weights as in WLS.(for example: scipy linregress is/was doing something similar working directly with summ of squares or cross products)
(I haven’t looked at those special cases in a long time, and didn’t check the details here. the ols version would be
y_hat = x @ pinv(x) @ y
)So, I am trying to understand the implementation and I agree, that there seems something wrong in
update_neighborhood
. I’m not certain about what, though.In the meanwhile - is there someone who could give me a hint about how
calculate_y_fit
is actually doing it? The not just says:Much appreciated if someone can hint me on why this code is calculating the least squares: