parallelization not working for nearest neighbors computation in umap 0.4
See original GitHub issueHi There,
I’m analyzing single-cell rna-seq data using scanpy on a Ubuntu virtual machine with 16 cpus and the following package versions:
- umap-learn 0.4.0 (installed from the 0.4dev branch earlier today)
- pynndescent 0.3.3
- scanpy 1.4.5.dev175+g64f04d8 (installed from the master branch earlier today)
I’m using the development 0.4 version of umap because it is supposed to have support for parallelized computation of the nearest neighbors. Specifically, scanpy calls:
from umap.umap_ import nearest_neighbors
random_state = check_random_state(random_state)
knn_indices, knn_dists, forest = nearest_neighbors(
X, n_neighbors, random_state=random_state,
metric=metric, metric_kwds=metric_kwds,
angular=angular, verbose=verbose,
)
the code is working but, empirically is just using a single CPU. It also gives the following warning message:
/opt/miniconda3/envs/py37_2/lib/python3.7/site-packages/numba/compiler.py:602: NumbaPerformanceWarning:
The keyword argument 'parallel=True' was specified but no transformation for parallel execution was possible.
To find out why, try turning on parallel diagnostics, see http://numba.pydata.org/numba-doc/latest/user/parallel.html#diagnostics for help.
File "../../../../../opt/miniconda3/envs/py37_2/lib/python3.7/site-packages/umap/nndescent.py", line 47:
@numba.njit(parallel=True)
def nn_descent(
^
self.func_ir.loc))
and when I turn on numba parallel diagnostics, that gives the report below. Any idea what is going on or if parallelized approximate nearest neighbors computations is supposed to be supported? Thanks!
================================================================================
Parallel Accelerator Optimizing: Function make_nn_descent.<locals>.nn_descent,
/opt/miniconda3/envs/py37_2/lib/python3.7/site-packages/umap/nndescent.py (46)
================================================================================
Parallel loop listing for Function make_nn_descent.<locals>.nn_descent, /opt/miniconda3/envs/py37_2/lib/python3.7/site-packages/umap/nndescent.py (46)
------------------------------------------------------------------------------------------|loop #ID
@numba.njit(parallel=True) |
def nn_descent( |
data, |
n_neighbors, |
rng_state, |
max_candidates=50, |
n_iters=10, |
delta=0.001, |
rho=0.5, |
rp_tree_init=True, |
leaf_array=None, |
verbose=False, |
): |
n_vertices = data.shape[0] |
|
current_graph = make_heap(data.shape[0], n_neighbors) |
for i in range(data.shape[0]): |
indices = rejection_sample(n_neighbors, data.shape[0], rng_state) |
for j in range(indices.shape[0]): |
d = dist(data[i], data[indices[j]], *dist_args) |
heap_push(current_graph, i, d, indices[j], 1) |
heap_push(current_graph, indices[j], d, i, 1) |
|
if rp_tree_init: |
for n in range(leaf_array.shape[0]): |
for i in range(leaf_array.shape[1]): |
if leaf_array[n, i] < 0: |
break |
for j in range(i + 1, leaf_array.shape[1]): |
if leaf_array[n, j] < 0: |
break |
d = dist( |
data[leaf_array[n, i]], data[leaf_array[n, j]], *dist_args |
) |
heap_push( |
current_graph, leaf_array[n, i], d, leaf_array[n, j], 1 |
) |
heap_push( |
current_graph, leaf_array[n, j], d, leaf_array[n, i], 1 |
) |
|
for n in range(n_iters): |
if verbose: |
print("\t", n, " / ", n_iters) |
|
candidate_neighbors = build_candidates( |
current_graph, n_vertices, n_neighbors, max_candidates, rng_state |
) |
|
c = 0 |
for i in range(n_vertices): |
for j in range(max_candidates): |
p = int(candidate_neighbors[0, i, j]) |
if p < 0 or tau_rand(rng_state) < rho: |
continue |
for k in range(max_candidates): |
q = int(candidate_neighbors[0, i, k]) |
if ( |
q < 0 |
or not candidate_neighbors[2, i, j] |
and not candidate_neighbors[2, i, k] |
): |
continue |
|
d = dist(data[p], data[q], *dist_args) |
c += heap_push(current_graph, p, d, q, 1) |
c += heap_push(current_graph, q, d, p, 1) |
|
if c <= delta * n_neighbors * data.shape[0]: |
break |
|
return deheap_sort(current_graph) |
--------------------------------- Fusing loops ---------------------------------
Attempting fusion of parallel loops (combines loops with similar properties)...
----------------------------- Before Optimisation ------------------------------
--------------------------------------------------------------------------------
------------------------------ After Optimisation ------------------------------
Parallel structure is already optimal.
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
---------------------------Loop invariant code motion---------------------------
Instruction hoisting:
No instruction hoisting found
--------------------------------------------------------------------------------
Issue Analytics
- State:
- Created 4 years ago
- Comments:13 (7 by maintainers)
Top Results From Across the Web
UMAP API Guide — umap 0.5 documentation - Read the Docs
Running in parallel is non-deterministic, and is not used if a random seed has been set, to ensure reproducibility.
Read more >Frequently Asked Questions — umap 0.5 documentation
For some datasets the default options for approximate nearest neighbor search can result in excessive memory use. If your dataset is not especially...
Read more >umap.umap_ — umap 0.3 documentation - Read the Docs
This may be exact, but more likely is approximated via nearest neighbor descent. ... Running in parallel is non-deterministic, and is not used...
Read more >umap.umap_ — umap 0.5 documentation - Read the Docs
f"This is not a problem as no vertices were disconnected. ... the k-neighbor graph of. n_neighbors: int The number of nearest neighbors to...
Read more >Precomputed k-nn — umap 0.5 documentation - Read the Docs
Instead, we can compute the knn for the largest n_neighbors we wish to analyze and then feed that precomputed_knn to UMAP. UMAP will...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
If you just want multicore NN search then the current pynndescent will get you that. If you want distributed/cluster support for NN search then you may have to wait a while longer.
I am using Numba for the parallelisation, and that (currently; see https://github.com/numba/numba/issues/2713) does not allow dynamic handling on how many threads to use. You can, however, set the environment variable
NUMBA_NUM_THREADS
to restrict the thread pool size similar to OpenMP. This is one of the things that I would like to see improved before the parallel stuff becomes standard.