question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support for float32 in `KDTree` and `BallTree`

See original GitHub issue

Description

Relates to #7059 and #11000.

The conversion from float32 to float64 in Isomap leads to cython class BinaryTree

Steps/Code to Reproduce

Debug example:

from sklearn.datasets import load_digits
from sklearn.manifold import Isomap
X, _ = load_digits(return_X_y=True)
X = X.astype('float32')
embedding = Isomap(n_components=2)
X_transformed = embedding.fit_transform(X[:100])

The change of data type occurs in line 157 of isomap when calling function:

kng = kneighbors_graph(self.nbrs_, self.n_neighbors,
                               metric=self.metric, p=self.p,
                               metric_params=self.metric_params,
                               mode='distance', n_jobs=self.n_jobs)

which leads to class KNeighborsMixin in _base.py, method kneighbors (line 531):

chunked_results = Parallel(n_jobs, **parallel_kwargs)(
                delayed_query(
                    self._tree, X[s], n_neighbors, return_distance)
                for s in gen_even_slices(X.shape[0], n_jobs)
            )

here, X is dtype float32, chunked_results is a list of arrays dtype float64

The problem arises in the method query of class BinaryTree in _binary_tree.pxi (line 1271). Data are casted into either DTYPE or DTYPE_t, which are defined as np.float64.

class BinaryTree should be changed to allow computations in both float32 and float64 data types

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
OmarManzoorcommented, Nov 7, 2022

Hi @OmarManzoor,

Thanks for you interest!

As stated by Gaël in #15474 (comment) and as explained in #7059 (comment), Tempita might be the best solution to introduce support for float32 in KDTree and in BallTree.

This is not the best good first issue to get started with Cython on scikit-learn.

What is your experience with Cython so far, @OmarManzoor?

I think resolving C and C++ compilations warnings of C and C++ sources file generated by Cython implementation is a good first Cython issue. I need to file an issue for describing this, but you can start to have a look at this in the meantime. Do you think this is a good starting starting point for you?

Thank you very much for such a detailed reply! To be honest I don’t have any prior experience with Cython but I am interested in learning and gaining some experience with it. I trust your judgement and would be fine with resolving compilation warnings.

1reaction
GaelVaroquauxcommented, Nov 2, 2019

Good investigation job! You can explore if cython’s fused type enable to solve this. If not, you’ll have to resort to templates, but that’s more work and less elegant.

Thanks for doing this!

Read more comments on GitHub >

github_iconTop Results From Across the Web

sklearn.neighbors.KDTree — scikit-learn 1.2.0 documentation
Note: Callable functions in the metric parameter are NOT supported for KDTree: and Ball Tree. Function call overhead will result in very poor...
Read more >
Any way to make BinaryTree support fused type? · Issue #7059
Cython fused types so that the memory needed can be drastically reduced. However, both KDTree and BallTree are subclasses of BinaryTree, and ...
Read more >
Benchmarking Nearest Neighbor Searches in Python
I recently submitted a scikit-learn pull request containing a brand new ball tree and kd-tree for fast nearest neighbor searches in python.
Read more >
scipy.spatial.KDTree.query — SciPy v1.9.3 Manual
This is used to prune tree searches, so if you are doing a series of nearest-neighbor queries, it may help to supply the...
Read more >
kdtree or balltree supporting insertion/deletion
I'm looking for a data structure to perform nearest-neighbor searches in 3D Euclidean space. I have used kd- and balltrees for this purpose ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found