[BUG] Label propagation sometimes produces label_distributions that contain Nan.
See original GitHub issueDescription
Invalid value encountered in true_divide through when calling fit on LabelSpreading.
After convergence, the label distribution for some samples is all zero and so the variable normalizer
in label_propagation.py:291 contains some zero values causing the division self.label_disributions_ /= normalizer
to produce NaN.
Maybe there is a connection to #8008? In other datasets, increasing the n_neighbors
parameter to a larger than the default value, caused the issue not to appear.
Steps/Code to Reproduce
from sklearn.datasets import fetch_mldata
from sklearn.semi_supervised import label_propagation
import numpy
numpy.seterr(all='raise')
mnist = fetch_mldata('MNIST original', data_home="./tmp")
X = mnist.data[1:10000]
y = mnist.target[1:10000]
# Use only 300 labeled examples
y[300:] = -1
lp_model = label_propagation.LabelSpreading(kernel='knn', n_neighbors=7, n_jobs=-1)
lp_model.fit(X,y)
Expected Results
No error is thrown.
Actual Results
File "reproduce.py", line 16, in <module>
lp_model.fit(X,y)
File "...anaconda3/envs/ssl-py3/lib/python3.6/site-packages/sklearn/semi_supervised/label_propagation.py", line 291, in fit
self.label_distributions_ /= normalizer
FloatingPointError: invalid value encountered in true_divide
Versions
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] NumPy 1.13.0 SciPy 0.19.0 Scikit-Learn 0.19.dev0
Issue Analytics
- State:
- Created 6 years ago
- Reactions:2
- Comments:17 (12 by maintainers)
Top Results From Across the Web
Label Propagation in sklearn is classifying every vector as 1
And if the graph is filled with 0, the label_distributions_ is filled with "nan" (because of normalization) and a warning appears.
Read more >Dynamic Label Propagation for Semi-supervised Multi-class ...
Here, we propose a semi-supervised multi-class/multi-label classification scheme, dynamic label propagation (DLP), which performs transductive ...
Read more >Semi-Supervised Learning With Label Propagation
Propagation refers to the iterative nature that labels are assigned to nodes in the graph and propagate along the edges of the graph...
Read more >labeling reaction optimization: Topics by Science.gov
Label placement is a tedious task in map design, and its automation has long ... the stochastic uncertainty propagation is applied to protein-labeling...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I am just wonderinf if this issue has been fixed ? Any updates? Thanks!
I have replicated this issue when instantiating the LabelSpreading model with the default parameter values, i.e., LabelSpreading(). When I switch to instantiate it with LabelSpreading(gamma=0.25, max_iter=5) then the error is not thrown. Even when using gamma=0, max_iter=1 to instantiate LabelSpreading works fine just not defining the values for those parameters generates the issue: label_propagation.py:293: RuntimeWarning: invalid value encountered in divide self.label_distributions_ /= normalizer