Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

parallelization not working for nearest neighbors computation in umap 0.4

See original GitHub issue

Hi There,

I’m analyzing single-cell rna-seq data using scanpy on a Ubuntu virtual machine with 16 cpus and the following package versions:

umap-learn 0.4.0 (installed from the 0.4dev branch earlier today)
pynndescent 0.3.3
scanpy 1.4.5.dev175+g64f04d8 (installed from the master branch earlier today)

I’m using the development 0.4 version of umap because it is supposed to have support for parallelized computation of the nearest neighbors. Specifically, scanpy calls:

from umap.umap_ import nearest_neighbors

random_state = check_random_state(random_state)

knn_indices, knn_dists, forest = nearest_neighbors(
        X, n_neighbors, random_state=random_state,
        metric=metric, metric_kwds=metric_kwds,
        angular=angular, verbose=verbose,
)

the code is working but, empirically is just using a single CPU. It also gives the following warning message:

/opt/miniconda3/envs/py37_2/lib/python3.7/site-packages/numba/compiler.py:602: NumbaPerformanceWarning: 
The keyword argument 'parallel=True' was specified but no transformation for parallel execution was possible.

To find out why, try turning on parallel diagnostics, see http://numba.pydata.org/numba-doc/latest/user/parallel.html#diagnostics for help.

File "../../../../../opt/miniconda3/envs/py37_2/lib/python3.7/site-packages/umap/nndescent.py", line 47:
    @numba.njit(parallel=True)
    def nn_descent(
    ^

  self.func_ir.loc))

and when I turn on numba parallel diagnostics, that gives the report below. Any idea what is going on or if parallelized approximate nearest neighbors computations is supposed to be supported? Thanks!

 
================================================================================
 Parallel Accelerator Optimizing:  Function make_nn_descent.<locals>.nn_descent,
 /opt/miniconda3/envs/py37_2/lib/python3.7/site-packages/umap/nndescent.py (46)
  
================================================================================


Parallel loop listing for  Function make_nn_descent.<locals>.nn_descent, /opt/miniconda3/envs/py37_2/lib/python3.7/site-packages/umap/nndescent.py (46) 
------------------------------------------------------------------------------------------|loop #ID
    @numba.njit(parallel=True)                                                            | 
    def nn_descent(                                                                       | 
        data,                                                                             | 
        n_neighbors,                                                                      | 
        rng_state,                                                                        | 
        max_candidates=50,                                                                | 
        n_iters=10,                                                                       | 
        delta=0.001,                                                                      | 
        rho=0.5,                                                                          | 
        rp_tree_init=True,                                                                | 
        leaf_array=None,                                                                  | 
        verbose=False,                                                                    | 
    ):                                                                                    | 
        n_vertices = data.shape[0]                                                        | 
                                                                                          | 
        current_graph = make_heap(data.shape[0], n_neighbors)                             | 
        for i in range(data.shape[0]):                                                    | 
            indices = rejection_sample(n_neighbors, data.shape[0], rng_state)             | 
            for j in range(indices.shape[0]):                                             | 
                d = dist(data[i], data[indices[j]], *dist_args)                           | 
                heap_push(current_graph, i, d, indices[j], 1)                             | 
                heap_push(current_graph, indices[j], d, i, 1)                             | 
                                                                                          | 
        if rp_tree_init:                                                                  | 
            for n in range(leaf_array.shape[0]):                                          | 
                for i in range(leaf_array.shape[1]):                                      | 
                    if leaf_array[n, i] < 0:                                              | 
                        break                                                             | 
                    for j in range(i + 1, leaf_array.shape[1]):                           | 
                        if leaf_array[n, j] < 0:                                          | 
                            break                                                         | 
                        d = dist(                                                         | 
                            data[leaf_array[n, i]], data[leaf_array[n, j]], *dist_args    | 
                        )                                                                 | 
                        heap_push(                                                        | 
                            current_graph, leaf_array[n, i], d, leaf_array[n, j], 1       | 
                        )                                                                 | 
                        heap_push(                                                        | 
                            current_graph, leaf_array[n, j], d, leaf_array[n, i], 1       | 
                        )                                                                 | 
                                                                                          | 
        for n in range(n_iters):                                                          | 
            if verbose:                                                                   | 
                print("\t", n, " / ", n_iters)                                            | 
                                                                                          | 
            candidate_neighbors = build_candidates(                                       | 
                current_graph, n_vertices, n_neighbors, max_candidates, rng_state         | 
            )                                                                             | 
                                                                                          | 
            c = 0                                                                         | 
            for i in range(n_vertices):                                                   | 
                for j in range(max_candidates):                                           | 
                    p = int(candidate_neighbors[0, i, j])                                 | 
                    if p < 0 or tau_rand(rng_state) < rho:                                | 
                        continue                                                          | 
                    for k in range(max_candidates):                                       | 
                        q = int(candidate_neighbors[0, i, k])                             | 
                        if (                                                              | 
                            q < 0                                                         | 
                            or not candidate_neighbors[2, i, j]                           | 
                            and not candidate_neighbors[2, i, k]                          | 
                        ):                                                                | 
                            continue                                                      | 
                                                                                          | 
                        d = dist(data[p], data[q], *dist_args)                            | 
                        c += heap_push(current_graph, p, d, q, 1)                         | 
                        c += heap_push(current_graph, q, d, p, 1)                         | 
                                                                                          | 
            if c <= delta * n_neighbors * data.shape[0]:                                  | 
                break                                                                     | 
                                                                                          | 
        return deheap_sort(current_graph)                                                 | 
--------------------------------- Fusing loops ---------------------------------
Attempting fusion of parallel loops (combines loops with similar properties)...
----------------------------- Before Optimisation ------------------------------
--------------------------------------------------------------------------------
------------------------------ After Optimisation ------------------------------
Parallel structure is already optimal.
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
 
---------------------------Loop invariant code motion---------------------------

Instruction hoisting:
No instruction hoisting found
--------------------------------------------------------------------------------

Issue Analytics

State:
Created 4 years ago
Comments:13 (7 by maintainers)

Top GitHub Comments

2reactions

lmcinnescommented, Jun 20, 2020

If you just want multicore NN search then the current pynndescent will get you that. If you want distributed/cluster support for NN search then you may have to wait a while longer.

1reaction

lmcinnescommented, Nov 13, 2019

I am using Numba for the parallelisation, and that (currently; see https://github.com/numba/numba/issues/2713) does not allow dynamic handling on how many threads to use. You can, however, set the environment variable NUMBA_NUM_THREADS to restrict the thread pool size similar to OpenMP. This is one of the things that I would like to see improved before the parallel stuff becomes standard.

Top Results From Across the Web

UMAP API Guide — umap 0.5 documentation - Read the Docs

Running in parallel is non-deterministic, and is not used if a random seed has been set, to ensure reproducibility.

Frequently Asked Questions — umap 0.5 documentation

For some datasets the default options for approximate nearest neighbor search can result in excessive memory use. If your dataset is not especially...

umap.umap_ — umap 0.3 documentation - Read the Docs

This may be exact, but more likely is approximated via nearest neighbor descent. ... Running in parallel is non-deterministic, and is not used...

umap.umap_ — umap 0.5 documentation - Read the Docs

f"This is not a problem as no vertices were disconnected. ... the k-neighbor graph of. n_neighbors: int The number of nearest neighbors to...

Precomputed k-nn — umap 0.5 documentation - Read the Docs

Instead, we can compute the knn for the largest n_neighbors we wish to analyze and then feed that precomputed_knn to UMAP. UMAP will...