question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support Haversine distance in NearestNeighbors

See original GitHub issue

Related to https://github.com/scikit-learn/scikit-learn/issues/4453

Currently using the Haversine distance with the default NearestNeigbors parameters produces an error,

>>> nn = NearestNeighbors(metric="haversine")
>>> nn.fit([[48.8322, 2.3561], [45.7679, 4.8506]])
NearestNeighbors(algorithm='auto', leaf_size=30, metric='haversine',
         metric_params=None, n_jobs=None, n_neighbors=5, p=2, radius=1.0)
>>> nn.kneighbors([[48.8322, 2.3561]], 2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/rth/src/scikit-learn/sklearn/neighbors/base.py", line 449, in kneighbors
    dist, neigh_ind = zip(*result)
  File "/home/rth/src/scikit-learn/sklearn/metrics/pairwise.py", line 1282, in pairwise_distances_chunked
    n_jobs=n_jobs, **kwds)
  File "/home/rth/src/scikit-learn/sklearn/metrics/pairwise.py", line 1380, in pairwise_distances
    "callable" % (metric, _VALID_METRICS))
ValueError: Unknown metric haversine. Valid metrics are ['euclidean', 'l2', 'l1', 'manhattan', 'cityblock', 'braycurtis', 'canberra', 'chebyshev', 'correlation', 'cosine', 'dice', 'hamming', 'jaccard', 'kulsinski', 'mahalanobis', 'matching', 'minkowski', 'rogerstanimoto', 'russellrao', 'seuclidean', 'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule', 'wminkowski'], or 'precomputed', or a callable

and yet, haversine is a valid metric for ball_tree,

>>> nn = NearestNeighbors(metric="haversine", algorithm='ball_tree')
>>> nn.fit([[48.8322, 2.3561], [45.7679, 4.8506]])
NearestNeighbors(algorithm='ball_tree', leaf_size=30, metric='haversine',
         metric_params=None, n_jobs=None, n_neighbors=5, p=2, radius=1.0)
>>> nn.kneighbors([[48.8322, 2.3561]], 2)
(array([[0.        , 2.80680966]]), array([[0, 1]]))

This means something is wrong in the input validation for algorithm='auto', which might also affect other metrics.

Note: this uses scikit-learn master.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:7 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
asberkcommented, Nov 10, 2018

This issue is a result of how algorithm = 'auto' is handled. It seems that NeighborsBase opts for a 'brute' approach when n_neighbors is large relative to the number of samples, and 'brute' does not presently admit 'haversine' as a valid metric.

if self._fit_method == 'auto':
    # A tree approach is better for small number of neighbors,
    # and KDTree is generally faster when available
    if ((self.n_neighbors is None or
         self.n_neighbors < self._fit_X.shape[0] // 2) and
            self.metric != 'precomputed'):
        if self.effective_metric_ in VALID_METRICS['kd_tree']:
            self._fit_method = 'kd_tree'
        elif (callable(self.effective_metric_) or
                self.effective_metric_ in VALID_METRICS['ball_tree']):
            self._fit_method = 'ball_tree'
        else:
            self._fit_method = 'brute'
    else:
        self._fit_method = 'brute'

Because of this, I don’t think hard-coding alg_check will work to evade the call specified by ._fit_method == 'brute'. I agree that resolving #4453 seems like the way to go.

0reactions
jnothmancommented, Nov 12, 2018

Though for the sake of consistency (and code logic simplicity), maybe it’s still better to add it pairwise_distances using DistanceMetric.get_metric(‘haversine’).pairwise.

Let’s do it. I would be interested in doing it for all metrics not directly implemented in metrics.pairwise, but I’d be worried about what happens when there are metrics of the same name across scipy.spatial and neighbors.DistanceMetric

Read more comments on GitHub >

github_iconTop Results From Across the Web

Using Scikit-learn's Binary Trees to Efficiently Find Latitude ...
To use a ball tree with the Haversine distance in scikit-learn, you must first convert the coordinates from degrees to radians. # Creates...
Read more >
sklearn.neighbors.NearestNeighbors
Algorithm used to compute the nearest neighbors: ... See the documentation of scipy.spatial.distance and the metrics listed in distance_metrics for valid ...
Read more >
Calculating the minimum haversine distance for a set of ...
These distances are wrong, my first question is, why is this? Is there any way I can correct this while retaining the algorithmic...
Read more >
Nearest neighbor analysis with large datasets - Read the Docs
So let's use them and find the nearest neighbors! # Find closest public transport stop for each building and get also the distance...
Read more >
Finding Nearest pair of Latitude and Longitude match using ...
Using Haversine Distance Equation, Here is a python code to find the closest ... My Maps will help to project the coordinates for...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found