Fitting a NearestNeighbors model fails with sparse input and a callable as metric
See original GitHub issueDescription
Fitting a NearestNeighbors
model fails when a) the distance metric
used is a callable
and b) the input to the NearestNeighbors
model is sparse.
Steps/Code to Reproduce
from scipy import sparse
from sklearn.neighbors import NearestNeighbors
def sparse_metric(x, y): # Some metric accepting sparse input
return x.count_nonzero() / y.count_nonzero()
A = sparse.random(10, 5, density=0.3, format='csr')
nn = NearestNeighbors(algorithm='brute', metric=sparse_metric).fit(A)
Expected Results
No error is thrown when passing a callable as metric with sparse input
Actual Results
ValueError Traceback (most recent call last)
<ipython-input-2-a9d2fd7f843b> in <module>()
7 A = sparse.random(10, 5, density=0.3, format='csr')
8
----> 9 nn = NearestNeighbors(algorithm='brute', metric=sparse_metric).fit(A)
/Volumes/LocalDataHD/thk22/.virtualenvs/nlpy3/lib/python3.5/site-packages/sklearn/neighbors/base.py in fit(self, X, y)
797 or [n_samples, n_samples] if metric='precomputed'.
798 """
--> 799 return self._fit(X)
/Volumes/LocalDataHD/thk22/.virtualenvs/nlpy3/lib/python3.5/site-packages/sklearn/neighbors/base.py in _fit(self, X)
213 if self.effective_metric_ not in VALID_METRICS_SPARSE['brute']:
214 raise ValueError("metric '%s' not valid for sparse input"
--> 215 % self.effective_metric_)
216 self._fit_X = X.copy()
217 self._tree = None
ValueError: metric '<function sparse_metric at 0x1097d0378>' not valid for sparse input
Some Analysis/Wild Speculation
The problem seems to come from the fact that in the case of sparse input, it is only checked whether the given metric is in the list of metrics accepting sparse input, but no check is made whether the given metric is a string or a callable: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/neighbors/base.py#L210
Versions
Darwin-15.6.0-x86_64-i386-64bit
Python 3.5.1 (default, Dec 8 2015, 06:00:08)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
NumPy 1.12.1
SciPy 0.19.0
Scikit-Learn 0.18.2
Issue Analytics
- State:
- Created 6 years ago
- Comments:11 (9 by maintainers)
Top Results From Across the Web
sklearn.neighbors.NearestNeighbors
If metric is a callable function, it takes two arrays representing 1D vectors as inputs and must return one value indicating the distance...
Read more >How to allow sklearn K Nearest Neighbors to take custom ...
The callable should take two arrays as input and return one value indicating the distance between them. This works for Scipy's metrics, but...
Read more >umap.umap_ — umap 0.3 documentation - Read the Docs
Compute indices of n nearest neighbors knn_indices = fast_knn_indices(X, ... to nn descent in umap if callable(metric): _distance_func = metric elif metric ......
Read more >Release History — scikit-learn 0.20.2 documentation
NearestNeighbors where fitting a NearestNeighbors model fails when a) the distance metric used is a callable and b) the input to the NearestNeighbors...
Read more >Release History — scikit-learn 0.22.dev0 documentation
This often occurs due to changes in the modelling logic (bug fixes or ... PLSRegression were raising an error when fitted with a...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
A very simple fix would be to change Line 214 in
sklearn/neighbors/base.py
fromto
Cool, I’ll submit a PR along with a test for it.