question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Nearest neighbors with trees perf decreased by debugging stats

See original GitHub issue

Description

For ball_tree and kd_tree algorithms, some stats about the tree queries highly decrease the parallelization performances increase.

Those stats are:

  • n_trims: queried points outside node radius
  • n_leaves: leaves reached while querying
  • n_splits: non-leaves queried nodes
  • n_calls: num of computed distances

Those stats only seem useful for debugging, do not look like part of the official API (no documentation) and only 2 (personal) git repos use the method (get_tree_stats) to get them.

Deactivating them highly improves performances of associated algorithms.

Benchmark

Test of kneighbors function with default parameters and:

  • samples dimension: 100
  • fit: 10k samples
  • kneighbors: 10k samples

(also tested openMP prange parallism but it does not improve perf)

=============
=== brute ===
=============
Joblib (loky) :
- n_jobs = 1 (MKL mono threaded) -> 2.6s
- n_jobs = 1 (MKL multi threaded, 40 threads) -> 1.9s
- n_jobs = 4  -> 4.0s
- n_jobs = 10 -> 3.5s
- n_jobs = 40 -> 3.5s

=================
=== ball_tree ===
=================
Joblib (loky) :
- n_jobs = 1  -> 10.9s
- n_jobs = 4  ->  7.7s
- n_jobs = 10 ->  6.8s
- n_jobs = 40 ->  3.8s

Joblib (loky) no stats:
- n_jobs = 1  -> 12.0s
- n_jobs = 4  ->  3.2s
- n_jobs = 10 ->  1.4s
- n_jobs = 40 ->  0.6s

OpenMP no stats:
- n_jobs = 4  ->  3.2s
- n_jobs = 10 ->  1.4s

===============
=== kd_tree ===
===============
Joblib (loky) :
- n_jobs = 1  -> 19.1s
- n_jobs = 4  ->  9.0s
- n_jobs = 10 ->  10.9s
- n_jobs = 40 ->  8.5s

Joblib (loky) no stats:
- n_jobs = 1  -> 19.0s
- n_jobs = 4  ->  5.1s
- n_jobs = 10 ->  2.2s
- n_jobs = 40 ->  1.0s

OpenMP no stats:
- n_jobs = 4  ->  5.1s
- n_jobs = 10 ->  2.2s

Issue Analytics

  • State:open
  • Created 5 years ago
  • Reactions:2
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
ogriselcommented, Apr 13, 2021

The performance scaling without stats seems indeed almost perfect (as x 1/n_jobs) while it’s much worse when they are enabled.

One possible explanation would be a typical case of False Sharing: CPU cache invalidation by concurrent write access in contiguously allocated data structure fields that live in the same cache line.

0reactions
ogriselcommented, Apr 14, 2021

One way to check this hypothesis would be to use linux perf or cachegrind to collect cache invalidation statistics with and without #19884.

Read more comments on GitHub >

github_iconTop Results From Across the Web

[WIP] ENH : Nearest-neighbors removal of unused stats ...
By deactivating some undocumented debugging stats: improves perf gain with n_jobs > 1 for nearest neighbors based on tree algorithms (cf. benchmark in ......
Read more >
Performance Optimization for the K Nearest-Neighbor Kernel ...
Nearest neighbor search is a cornerstone problem in compu- tational geometry, non-parametric statistics, and machine learning.
Read more >
Fast Nearest Neighbor Queries in Haskell - Mike Izbicki
Two weeks ago at ICML, I presented a method for making nearest neighbor queries faster. The paper is called Faster Cover Trees and...
Read more >
1.6. Nearest Neighbors — scikit-learn 1.2.0 documentation
As k becomes large compared to N , the ability to prune branches in a tree-based query is reduced. In this situation, Brute...
Read more >
List of Debugger Built-in Rules - Amazon SageMaker
Analyze tensors emitted during the training of machine learning models with Amazon SageMaker Debugger built-in rules.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found