NearestCentroid classifier cannot handle data that is always exactly zero
See original GitHub issueDescribe the bug
The NearestCentroid classifier cannot handle data that is always exactly zero in combination with using the shrink_threshold.
Steps/Code to Reproduce
import sklearn
from sklearn.neighbors import NearestCentroid
sklearn.show_versions()
X = [(0, 0),
(0, 0),
(0, 0),
(0, 0),
(0, 0),
(0, 0),
(0, 0),
(0, 0),
(0, 0),
(0, 0)]
y = [0, 0, 1, 0, 0, 0, 1, 0, 1, 0]
clf = NearestCentroid(shrink_threshold=0.1)
clf.fit(X, y)
clf.predict(X)
Expected Results
Either no error is thrown and the data is handled correctly or a ValueError that specifies the concrete problem, i.e., that all feature values are the same.
Actual Results
C:\Users\sherbold\AppData\Local\Programs\Python\Python37\lib\site-packages\sklearn\neighbors\nearest_centroid.py:159: RuntimeWarning: invalid value encountered in true_divide
deviation = ((self.centroids_ - dataset_centroid_) / ms)
C:\Users\sherbold\AppData\Local\Programs\Python\Python37\lib\site-packages\sklearn\neighbors\nearest_centroid.py:162: RuntimeWarning: invalid value encountered in sign
signs = np.sign(deviation)
Traceback (most recent call last):
File "C:/Users/sherbold/PycharmProjects/icb/tmp.py", line 21, in <module>
clf.predict(X)
File "C:\Users\sherbold\AppData\Local\Programs\Python\Python37\lib\site-packages\sklearn\neighbors\nearest_centroid.py", line 194, in predict
X, self.centroids_, metric=self.metric).argmin(axis=1)]
File "C:\Users\sherbold\AppData\Local\Programs\Python\Python37\lib\site-packages\sklearn\metrics\pairwise.py", line 1588, in pairwise_distances
return _parallel_pairwise(X, Y, func, n_jobs, **kwds)
File "C:\Users\sherbold\AppData\Local\Programs\Python\Python37\lib\site-packages\sklearn\metrics\pairwise.py", line 1206, in _parallel_pairwise
return func(X, Y, **kwds)
File "C:\Users\sherbold\AppData\Local\Programs\Python\Python37\lib\site-packages\sklearn\metrics\pairwise.py", line 232, in euclidean_distances
X, Y = check_pairwise_arrays(X, Y)
File "C:\Users\sherbold\AppData\Local\Programs\Python\Python37\lib\site-packages\sklearn\metrics\pairwise.py", line 114, in check_pairwise_arrays
estimator=estimator)
File "C:\Users\sherbold\AppData\Local\Programs\Python\Python37\lib\site-packages\sklearn\utils\validation.py", line 542, in check_array
allow_nan=force_all_finite == 'allow-nan')
File "C:\Users\sherbold\AppData\Local\Programs\Python\Python37\lib\site-packages\sklearn\utils\validation.py", line 56, in _assert_all_finite
raise ValueError(msg_err.format(type_err, X.dtype))
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
Versions
System: python: 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 22:22:05) [MSC v.1916 64 bit (AMD64)] executable: C:\Users\sherbold\AppData\Local\Programs\Python\Python37\python.exe machine: Windows-10-10.0.19041-SP0
Python deps: pip: 19.0.3 setuptools: 42.0.1 sklearn: 0.21.3 numpy: 1.17.4 scipy: 1.3.2 Cython: None pandas: 0.25.3
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (5 by maintainers)
Top Results From Across the Web
sklearn.neighbors.NearestCentroid
Nearest centroid classifier. Each class is represented by its centroid, with test samples classified to the class with the nearest centroid. Read more...
Read more >Machine Learning: NCC and LDA - YouTube
Starting with the simple Nearest Centroid Classifier and then Fishers Linear Discriminant Analysis, ... Your browser can't play this video.
Read more >nearest centroid classifier: Topics by Science.gov
In this paper, a new classifier called fuzzy-based k-nearest centroid neighbor ... We report results on both synthetic data and real-world image data.«...
Read more >Nearest centroid classification on a trapped ion quantum ...
In particular, we design a quantum Nearest Centroid classifier, ... Often, especially in cases where there is a large amount of data, ...
Read more >High-Dimensional Problems: p ≫ N - Data Explorer
The diagonal LDA classifier is often effective in high dimensional set- ... as λ approaches zero, the lasso fits the training data exactly....
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@rushabh-v It seems that @Trevor-Waite already took the issue. However, there are plenty of other issues if you want to help.
Closed in #18370.