question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BallTree.query returns inconsistent indices between scikit-learn versions 0.24.1 and 1.1.1

See original GitHub issue

Describe the bug

For the same data, BallTree.query can return different nearest neighbour indices if these neighbours are equally distanced from a point depending on scikit-learn version. Specifically, this can be observed in 0.24.1 vs 1.1.1.

Steps/Code to Reproduce

import numpy as np
from sklearn.neighbors import BallTree

np.random.seed(61)

X = np.random.randint(0, 3, size=(10,2))  # generated dataset

tree = BallTree(X, 4, 'hamming')
distances, indices = tree.query(X, 3, return_distance=True, dualtree=False, breadth_first=False, sort_results=True)

for i, (point, distances, indices) in enumerate(zip(X, distances, indices)):
    print(f'index {i}: datapoint {point} distances: {[round(dist,2) for dist in distances]} indices: {indices}')

Expected Results

scikit-learn versions 0.24.1 and 1.1.1 would be expected to consistently show the same nearest neighbour indices whenever multiple neighbours are equally distanced from a data point

Actual Results

scikit-learn v. 0.24.1:

scikit-0 24 1

scikit-learn v. 1.1.1:

scikit-1 1 1

Versions

System:
  python: 3.9.7 (v3.9.7:1016ef3790, Aug 30 2021, 16:39:15) [Clang 6.0 (clang-600.0.57)]
executable: /Users/korigo/Downloads/scikit-0.24/bin/python3
  machine: macOS-10.16-x86_64-i386-64bit
Python dependencies:
     pip: 22.0.4
  setuptools: 62.1.0
   sklearn: 0.24.1
    numpy: 1.22.4
    scipy: 1.8.1
    Cython: None
    pandas: None
  matplotlib: None
    joblib: 1.1.0
threadpoolctl: 3.1.0

Built with OpenMP: True

System:
  python: 3.9.7 (v3.9.7:1016ef3790, Aug 30 2021, 16:39:15) [Clang 6.0 (clang-600.0.57)]
executable: /Users/korigo/Downloads/scikit-1.1.1/bin/python3
  machine: macOS-10.16-x86_64-i386-64bit
Python dependencies:
   sklearn: 1.1.1
     pip: 22.0.4
  setuptools: 62.1.0
    numpy: 1.22.4
    scipy: 1.8.1
    Cython: None
    pandas: None
  matplotlib: None
    joblib: 1.1.0
threadpoolctl: 3.1.0

Built with OpenMP: True

Issue Analytics

  • State:open
  • Created a year ago
  • Reactions:1
  • Comments:10 (8 by maintainers)

github_iconTop GitHub Comments

2reactions
ogriselcommented, Jun 22, 2022

I opened #23728 to discuss how randomization could be a better way to break ties throughout scikit-learn in general.

1reaction
ogriselcommented, Jun 23, 2022

Feel free to open a PR and link to this issue in the description.

Read more comments on GitHub >

github_iconTop Results From Across the Web

sklearn.neighbors.BallTree — scikit-learn 1.2.0 documentation
Xarray-like of shape (n_samples, n_features) An array of points to query. kint, default=1. The number of nearest neighbors to return.
Read more >
scikit-learn user guide
1.2.10 Why did you remove HMMs from scikit-learn? ... Printable pdf documentation for old versions can be found here. 1.4 Related Projects.
Read more >
8.21.6. sklearn.neighbors.BallTree - GitHub Pages
class sklearn.neighbors.BallTree¶. Ball Tree for fast nearest-neighbor searches : ... Changing leaf_size will not affect the results of a query, ...
Read more >
SciKits BallTree method gives me incorrect "nearest neighbor"
Build BallTree with haversine distance metric, which expects (lat, lon) in radians and returns distances in radians dist ...
Read more >
scikit-learn - bytemeta
_svmlight_format_fast.pyx does not compile with Cython alpha version · BallTree.query returns inconsistent indices between scikit-learn versions 0.24.1 and 1.1.1.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found