question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Fitting a NearestNeighbors model fails with sparse input and a callable as metric

See original GitHub issue

Description

Fitting a NearestNeighbors model fails when a) the distance metric used is a callable and b) the input to the NearestNeighbors model is sparse.

Steps/Code to Reproduce

from scipy import sparse
from sklearn.neighbors import NearestNeighbors

def sparse_metric(x, y): # Some metric accepting sparse input
    return x.count_nonzero() / y.count_nonzero()

A = sparse.random(10, 5, density=0.3, format='csr')

nn = NearestNeighbors(algorithm='brute', metric=sparse_metric).fit(A)

Expected Results

No error is thrown when passing a callable as metric with sparse input

Actual Results

ValueError                                Traceback (most recent call last)
<ipython-input-2-a9d2fd7f843b> in <module>()
      7 A = sparse.random(10, 5, density=0.3, format='csr')
      8 
----> 9 nn = NearestNeighbors(algorithm='brute', metric=sparse_metric).fit(A)

/Volumes/LocalDataHD/thk22/.virtualenvs/nlpy3/lib/python3.5/site-packages/sklearn/neighbors/base.py in fit(self, X, y)
    797             or [n_samples, n_samples] if metric='precomputed'.
    798         """
--> 799         return self._fit(X)

/Volumes/LocalDataHD/thk22/.virtualenvs/nlpy3/lib/python3.5/site-packages/sklearn/neighbors/base.py in _fit(self, X)
    213             if self.effective_metric_ not in VALID_METRICS_SPARSE['brute']:
    214                 raise ValueError("metric '%s' not valid for sparse input"
--> 215                                  % self.effective_metric_)
    216             self._fit_X = X.copy()
    217             self._tree = None

ValueError: metric '<function sparse_metric at 0x1097d0378>' not valid for sparse input

Some Analysis/Wild Speculation

The problem seems to come from the fact that in the case of sparse input, it is only checked whether the given metric is in the list of metrics accepting sparse input, but no check is made whether the given metric is a string or a callable: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/neighbors/base.py#L210

Versions

Darwin-15.6.0-x86_64-i386-64bit
Python 3.5.1 (default, Dec  8 2015, 06:00:08) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
NumPy 1.12.1
SciPy 0.19.0
Scikit-Learn 0.18.2

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:11 (9 by maintainers)

github_iconTop GitHub Comments

2reactions
tttthomassssscommented, Jun 22, 2017

A very simple fix would be to change Line 214 in sklearn/neighbors/base.py from

if self.effective_metric_ not in VALID_METRICS_SPARSE['brute']:

to

if not callable(self.effective_metric_) and self.effective_metric_ not in VALID_METRICS_SPARSE['brute']:
1reaction
tttthomassssscommented, Jun 22, 2017

Cool, I’ll submit a PR along with a test for it.

Read more comments on GitHub >

github_iconTop Results From Across the Web

sklearn.neighbors.NearestNeighbors
If metric is a callable function, it takes two arrays representing 1D vectors as inputs and must return one value indicating the distance...
Read more >
How to allow sklearn K Nearest Neighbors to take custom ...
The callable should take two arrays as input and return one value indicating the distance between them. This works for Scipy's metrics, but...
Read more >
umap.umap_ — umap 0.3 documentation - Read the Docs
Compute indices of n nearest neighbors knn_indices = fast_knn_indices(X, ... to nn descent in umap if callable(metric): _distance_func = metric elif metric ......
Read more >
Release History — scikit-learn 0.20.2 documentation
NearestNeighbors where fitting a NearestNeighbors model fails when a) the distance metric used is a callable and b) the input to the NearestNeighbors...
Read more >
Release History — scikit-learn 0.22.dev0 documentation
This often occurs due to changes in the modelling logic (bug fixes or ... PLSRegression were raising an error when fitted with a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found