BallTree.query returns inconsistent indices between scikit-learn versions 0.24.1 and 1.1.1
See original GitHub issueDescribe the bug
For the same data, BallTree.query
can return different nearest neighbour indices if these neighbours are equally distanced from a point depending on scikit-learn version. Specifically, this can be observed in 0.24.1 vs 1.1.1.
Steps/Code to Reproduce
import numpy as np
from sklearn.neighbors import BallTree
np.random.seed(61)
X = np.random.randint(0, 3, size=(10,2)) # generated dataset
tree = BallTree(X, 4, 'hamming')
distances, indices = tree.query(X, 3, return_distance=True, dualtree=False, breadth_first=False, sort_results=True)
for i, (point, distances, indices) in enumerate(zip(X, distances, indices)):
print(f'index {i}: datapoint {point} distances: {[round(dist,2) for dist in distances]} indices: {indices}')
Expected Results
scikit-learn versions 0.24.1 and 1.1.1 would be expected to consistently show the same nearest neighbour indices whenever multiple neighbours are equally distanced from a data point
Actual Results
scikit-learn v. 0.24.1:
scikit-learn v. 1.1.1:
Versions
System:
python: 3.9.7 (v3.9.7:1016ef3790, Aug 30 2021, 16:39:15) [Clang 6.0 (clang-600.0.57)]
executable: /Users/korigo/Downloads/scikit-0.24/bin/python3
machine: macOS-10.16-x86_64-i386-64bit
Python dependencies:
pip: 22.0.4
setuptools: 62.1.0
sklearn: 0.24.1
numpy: 1.22.4
scipy: 1.8.1
Cython: None
pandas: None
matplotlib: None
joblib: 1.1.0
threadpoolctl: 3.1.0
Built with OpenMP: True
System:
python: 3.9.7 (v3.9.7:1016ef3790, Aug 30 2021, 16:39:15) [Clang 6.0 (clang-600.0.57)]
executable: /Users/korigo/Downloads/scikit-1.1.1/bin/python3
machine: macOS-10.16-x86_64-i386-64bit
Python dependencies:
sklearn: 1.1.1
pip: 22.0.4
setuptools: 62.1.0
numpy: 1.22.4
scipy: 1.8.1
Cython: None
pandas: None
matplotlib: None
joblib: 1.1.0
threadpoolctl: 3.1.0
Built with OpenMP: True
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:10 (8 by maintainers)
Top Results From Across the Web
sklearn.neighbors.BallTree — scikit-learn 1.2.0 documentation
Xarray-like of shape (n_samples, n_features) An array of points to query. kint, default=1. The number of nearest neighbors to return.
Read more >scikit-learn user guide
1.2.10 Why did you remove HMMs from scikit-learn? ... Printable pdf documentation for old versions can be found here. 1.4 Related Projects.
Read more >8.21.6. sklearn.neighbors.BallTree - GitHub Pages
class sklearn.neighbors.BallTree¶. Ball Tree for fast nearest-neighbor searches : ... Changing leaf_size will not affect the results of a query, ...
Read more >SciKits BallTree method gives me incorrect "nearest neighbor"
Build BallTree with haversine distance metric, which expects (lat, lon) in radians and returns distances in radians dist ...
Read more >scikit-learn - bytemeta
_svmlight_format_fast.pyx does not compile with Cython alpha version · BallTree.query returns inconsistent indices between scikit-learn versions 0.24.1 and 1.1.1.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I opened #23728 to discuss how randomization could be a better way to break ties throughout scikit-learn in general.
Feel free to open a PR and link to this issue in the description.