Support for float32 in `KDTree` and `BallTree`
See original GitHub issueDescription
The conversion from float32 to float64 in Isomap leads to cython class BinaryTree
Steps/Code to Reproduce
Debug example:
from sklearn.datasets import load_digits
from sklearn.manifold import Isomap
X, _ = load_digits(return_X_y=True)
X = X.astype('float32')
embedding = Isomap(n_components=2)
X_transformed = embedding.fit_transform(X[:100])
The change of data type occurs in line 157 of isomap when calling function:
kng = kneighbors_graph(self.nbrs_, self.n_neighbors,
metric=self.metric, p=self.p,
metric_params=self.metric_params,
mode='distance', n_jobs=self.n_jobs)
which leads to class KNeighborsMixin in _base.py, method kneighbors (line 531):
chunked_results = Parallel(n_jobs, **parallel_kwargs)(
delayed_query(
self._tree, X[s], n_neighbors, return_distance)
for s in gen_even_slices(X.shape[0], n_jobs)
)
here, X is dtype float32, chunked_results
is a list of arrays dtype float64
The problem arises in the method query of class BinaryTree in _binary_tree.pxi
(line 1271). Data are casted into either DTYPE
or DTYPE_t
, which are defined as np.float64.
class BinaryTree
should be changed to allow computations in both float32 and float64 data types
Issue Analytics
- State:
- Created 4 years ago
- Comments:6 (6 by maintainers)
Top Results From Across the Web
sklearn.neighbors.KDTree — scikit-learn 1.2.0 documentation
Note: Callable functions in the metric parameter are NOT supported for KDTree: and Ball Tree. Function call overhead will result in very poor...
Read more >Any way to make BinaryTree support fused type? · Issue #7059
Cython fused types so that the memory needed can be drastically reduced. However, both KDTree and BallTree are subclasses of BinaryTree, and ...
Read more >Benchmarking Nearest Neighbor Searches in Python
I recently submitted a scikit-learn pull request containing a brand new ball tree and kd-tree for fast nearest neighbor searches in python.
Read more >scipy.spatial.KDTree.query — SciPy v1.9.3 Manual
This is used to prune tree searches, so if you are doing a series of nearest-neighbor queries, it may help to supply the...
Read more >kdtree or balltree supporting insertion/deletion
I'm looking for a data structure to perform nearest-neighbor searches in 3D Euclidean space. I have used kd- and balltrees for this purpose ......
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Thank you very much for such a detailed reply! To be honest I don’t have any prior experience with Cython but I am interested in learning and gaining some experience with it. I trust your judgement and would be fine with resolving compilation warnings.
Good investigation job! You can explore if cython’s fused type enable to solve this. If not, you’ll have to resort to templates, but that’s more work and less elegant.
Thanks for doing this!