Nearest neighbors with trees perf decreased by debugging stats
See original GitHub issueDescription
For ball_tree
and kd_tree
algorithms, some stats about the tree queries highly decrease the parallelization performances increase.
Those stats are:
n_trims
: queried points outside node radiusn_leaves
: leaves reached while queryingn_splits
: non-leaves queried nodesn_calls
: num of computed distances
Those stats only seem useful for debugging, do not look like part of the official API (no documentation) and only 2 (personal) git repos use the method (get_tree_stats
) to get them.
Deactivating them highly improves performances of associated algorithms.
Benchmark
Test of kneighbors function with default parameters and:
- samples dimension: 100
- fit: 10k samples
- kneighbors: 10k samples
(also tested openMP prange parallism but it does not improve perf)
=============
=== brute ===
=============
Joblib (loky) :
- n_jobs = 1 (MKL mono threaded) -> 2.6s
- n_jobs = 1 (MKL multi threaded, 40 threads) -> 1.9s
- n_jobs = 4 -> 4.0s
- n_jobs = 10 -> 3.5s
- n_jobs = 40 -> 3.5s
=================
=== ball_tree ===
=================
Joblib (loky) :
- n_jobs = 1 -> 10.9s
- n_jobs = 4 -> 7.7s
- n_jobs = 10 -> 6.8s
- n_jobs = 40 -> 3.8s
Joblib (loky) no stats:
- n_jobs = 1 -> 12.0s
- n_jobs = 4 -> 3.2s
- n_jobs = 10 -> 1.4s
- n_jobs = 40 -> 0.6s
OpenMP no stats:
- n_jobs = 4 -> 3.2s
- n_jobs = 10 -> 1.4s
===============
=== kd_tree ===
===============
Joblib (loky) :
- n_jobs = 1 -> 19.1s
- n_jobs = 4 -> 9.0s
- n_jobs = 10 -> 10.9s
- n_jobs = 40 -> 8.5s
Joblib (loky) no stats:
- n_jobs = 1 -> 19.0s
- n_jobs = 4 -> 5.1s
- n_jobs = 10 -> 2.2s
- n_jobs = 40 -> 1.0s
OpenMP no stats:
- n_jobs = 4 -> 5.1s
- n_jobs = 10 -> 2.2s
Issue Analytics
- State:
- Created 5 years ago
- Reactions:2
- Comments:6 (6 by maintainers)
Top Results From Across the Web
[WIP] ENH : Nearest-neighbors removal of unused stats ...
By deactivating some undocumented debugging stats: improves perf gain with n_jobs > 1 for nearest neighbors based on tree algorithms (cf. benchmark in ......
Read more >Performance Optimization for the K Nearest-Neighbor Kernel ...
Nearest neighbor search is a cornerstone problem in compu- tational geometry, non-parametric statistics, and machine learning.
Read more >Fast Nearest Neighbor Queries in Haskell - Mike Izbicki
Two weeks ago at ICML, I presented a method for making nearest neighbor queries faster. The paper is called Faster Cover Trees and...
Read more >1.6. Nearest Neighbors — scikit-learn 1.2.0 documentation
As k becomes large compared to N , the ability to prune branches in a tree-based query is reduced. In this situation, Brute...
Read more >List of Debugger Built-in Rules - Amazon SageMaker
Analyze tensors emitted during the training of machine learning models with Amazon SageMaker Debugger built-in rules.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
One possible explanation would be a typical case of False Sharing: CPU cache invalidation by concurrent write access in contiguously allocated data structure fields that live in the same cache line.
One way to check this hypothesis would be to use linux perf or cachegrind to collect cache invalidation statistics with and without #19884.