Support Haversine distance in NearestNeighbors
See original GitHub issueRelated to https://github.com/scikit-learn/scikit-learn/issues/4453
Currently using the Haversine distance with the default NearestNeigbors
parameters produces an error,
>>> nn = NearestNeighbors(metric="haversine")
>>> nn.fit([[48.8322, 2.3561], [45.7679, 4.8506]])
NearestNeighbors(algorithm='auto', leaf_size=30, metric='haversine',
metric_params=None, n_jobs=None, n_neighbors=5, p=2, radius=1.0)
>>> nn.kneighbors([[48.8322, 2.3561]], 2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/rth/src/scikit-learn/sklearn/neighbors/base.py", line 449, in kneighbors
dist, neigh_ind = zip(*result)
File "/home/rth/src/scikit-learn/sklearn/metrics/pairwise.py", line 1282, in pairwise_distances_chunked
n_jobs=n_jobs, **kwds)
File "/home/rth/src/scikit-learn/sklearn/metrics/pairwise.py", line 1380, in pairwise_distances
"callable" % (metric, _VALID_METRICS))
ValueError: Unknown metric haversine. Valid metrics are ['euclidean', 'l2', 'l1', 'manhattan', 'cityblock', 'braycurtis', 'canberra', 'chebyshev', 'correlation', 'cosine', 'dice', 'hamming', 'jaccard', 'kulsinski', 'mahalanobis', 'matching', 'minkowski', 'rogerstanimoto', 'russellrao', 'seuclidean', 'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule', 'wminkowski'], or 'precomputed', or a callable
and yet, haversine
is a valid metric for ball_tree
,
>>> nn = NearestNeighbors(metric="haversine", algorithm='ball_tree')
>>> nn.fit([[48.8322, 2.3561], [45.7679, 4.8506]])
NearestNeighbors(algorithm='ball_tree', leaf_size=30, metric='haversine',
metric_params=None, n_jobs=None, n_neighbors=5, p=2, radius=1.0)
>>> nn.kneighbors([[48.8322, 2.3561]], 2)
(array([[0. , 2.80680966]]), array([[0, 1]]))
This means something is wrong in the input validation for algorithm='auto'
, which might also affect other metrics.
Note: this uses scikit-learn master.
Issue Analytics
- State:
- Created 5 years ago
- Comments:7 (5 by maintainers)
Top Results From Across the Web
Using Scikit-learn's Binary Trees to Efficiently Find Latitude ...
To use a ball tree with the Haversine distance in scikit-learn, you must first convert the coordinates from degrees to radians. # Creates...
Read more >sklearn.neighbors.NearestNeighbors
Algorithm used to compute the nearest neighbors: ... See the documentation of scipy.spatial.distance and the metrics listed in distance_metrics for valid ...
Read more >Calculating the minimum haversine distance for a set of ...
These distances are wrong, my first question is, why is this? Is there any way I can correct this while retaining the algorithmic...
Read more >Nearest neighbor analysis with large datasets - Read the Docs
So let's use them and find the nearest neighbors! # Find closest public transport stop for each building and get also the distance...
Read more >Finding Nearest pair of Latitude and Longitude match using ...
Using Haversine Distance Equation, Here is a python code to find the closest ... My Maps will help to project the coordinates for...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
This issue is a result of how
algorithm = 'auto'
is handled. It seems that NeighborsBase opts for a'brute'
approach whenn_neighbors
is large relative to the number of samples, and'brute'
does not presently admit'haversine'
as a valid metric.Because of this, I don’t think hard-coding
alg_check
will work to evade the call specified by._fit_method == 'brute'
. I agree that resolving #4453 seems like the way to go.Let’s do it. I would be interested in doing it for all metrics not directly implemented in metrics.pairwise, but I’d be worried about what happens when there are metrics of the same name across scipy.spatial and neighbors.DistanceMetric