question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

NearestCentroid classifier cannot handle data that is always exactly zero

See original GitHub issue

Describe the bug

The NearestCentroid classifier cannot handle data that is always exactly zero in combination with using the shrink_threshold.

Steps/Code to Reproduce

import sklearn
from sklearn.neighbors import NearestCentroid

sklearn.show_versions()

X = [(0, 0),
     (0, 0),
     (0, 0),
     (0, 0),
     (0, 0),
     (0, 0),
     (0, 0),
     (0, 0),
     (0, 0),
     (0, 0)]

y = [0, 0, 1, 0, 0, 0, 1, 0, 1, 0]

clf = NearestCentroid(shrink_threshold=0.1)
clf.fit(X, y)
clf.predict(X)

Expected Results

Either no error is thrown and the data is handled correctly or a ValueError that specifies the concrete problem, i.e., that all feature values are the same.

Actual Results

C:\Users\sherbold\AppData\Local\Programs\Python\Python37\lib\site-packages\sklearn\neighbors\nearest_centroid.py:159: RuntimeWarning: invalid value encountered in true_divide
  deviation = ((self.centroids_ - dataset_centroid_) / ms)
C:\Users\sherbold\AppData\Local\Programs\Python\Python37\lib\site-packages\sklearn\neighbors\nearest_centroid.py:162: RuntimeWarning: invalid value encountered in sign
  signs = np.sign(deviation)
Traceback (most recent call last):
  File "C:/Users/sherbold/PycharmProjects/icb/tmp.py", line 21, in <module>
    clf.predict(X)
  File "C:\Users\sherbold\AppData\Local\Programs\Python\Python37\lib\site-packages\sklearn\neighbors\nearest_centroid.py", line 194, in predict
    X, self.centroids_, metric=self.metric).argmin(axis=1)]
  File "C:\Users\sherbold\AppData\Local\Programs\Python\Python37\lib\site-packages\sklearn\metrics\pairwise.py", line 1588, in pairwise_distances
    return _parallel_pairwise(X, Y, func, n_jobs, **kwds)
  File "C:\Users\sherbold\AppData\Local\Programs\Python\Python37\lib\site-packages\sklearn\metrics\pairwise.py", line 1206, in _parallel_pairwise
    return func(X, Y, **kwds)
  File "C:\Users\sherbold\AppData\Local\Programs\Python\Python37\lib\site-packages\sklearn\metrics\pairwise.py", line 232, in euclidean_distances
    X, Y = check_pairwise_arrays(X, Y)
  File "C:\Users\sherbold\AppData\Local\Programs\Python\Python37\lib\site-packages\sklearn\metrics\pairwise.py", line 114, in check_pairwise_arrays
    estimator=estimator)
  File "C:\Users\sherbold\AppData\Local\Programs\Python\Python37\lib\site-packages\sklearn\utils\validation.py", line 542, in check_array
    allow_nan=force_all_finite == 'allow-nan')
  File "C:\Users\sherbold\AppData\Local\Programs\Python\Python37\lib\site-packages\sklearn\utils\validation.py", line 56, in _assert_all_finite
    raise ValueError(msg_err.format(type_err, X.dtype))
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

Versions

System: python: 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 22:22:05) [MSC v.1916 64 bit (AMD64)] executable: C:\Users\sherbold\AppData\Local\Programs\Python\Python37\python.exe machine: Windows-10-10.0.19041-SP0

Python deps: pip: 19.0.3 setuptools: 42.0.1 sklearn: 0.21.3 numpy: 1.17.4 scipy: 1.3.2 Cython: None pandas: 0.25.3

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
glemaitrecommented, Sep 4, 2020

@rushabh-v It seems that @Trevor-Waite already took the issue. However, there are plenty of other issues if you want to help.

0reactions
cmarmocommented, Feb 5, 2021

Closed in #18370.

Read more comments on GitHub >

github_iconTop Results From Across the Web

sklearn.neighbors.NearestCentroid
Nearest centroid classifier. Each class is represented by its centroid, with test samples classified to the class with the nearest centroid. Read more...
Read more >
Machine Learning: NCC and LDA - YouTube
Starting with the simple Nearest Centroid Classifier and then Fishers Linear Discriminant Analysis, ... Your browser can't play this video.
Read more >
nearest centroid classifier: Topics by Science.gov
In this paper, a new classifier called fuzzy-based k-nearest centroid neighbor ... We report results on both synthetic data and real-world image data.«...
Read more >
Nearest centroid classification on a trapped ion quantum ...
In particular, we design a quantum Nearest Centroid classifier, ... Often, especially in cases where there is a large amount of data, ...
Read more >
High-Dimensional Problems: p ≫ N - Data Explorer
The diagonal LDA classifier is often effective in high dimensional set- ... as λ approaches zero, the lasso fits the training data exactly....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found