question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

parallelization not working for nearest neighbors computation in umap 0.4

See original GitHub issue

Hi There,

I’m analyzing single-cell rna-seq data using scanpy on a Ubuntu virtual machine with 16 cpus and the following package versions:

  • umap-learn 0.4.0 (installed from the 0.4dev branch earlier today)
  • pynndescent 0.3.3
  • scanpy 1.4.5.dev175+g64f04d8 (installed from the master branch earlier today)

I’m using the development 0.4 version of umap because it is supposed to have support for parallelized computation of the nearest neighbors. Specifically, scanpy calls:

from umap.umap_ import nearest_neighbors

random_state = check_random_state(random_state)

knn_indices, knn_dists, forest = nearest_neighbors(
        X, n_neighbors, random_state=random_state,
        metric=metric, metric_kwds=metric_kwds,
        angular=angular, verbose=verbose,
)

the code is working but, empirically is just using a single CPU. It also gives the following warning message:

/opt/miniconda3/envs/py37_2/lib/python3.7/site-packages/numba/compiler.py:602: NumbaPerformanceWarning: 
The keyword argument 'parallel=True' was specified but no transformation for parallel execution was possible.

To find out why, try turning on parallel diagnostics, see http://numba.pydata.org/numba-doc/latest/user/parallel.html#diagnostics for help.

File "../../../../../opt/miniconda3/envs/py37_2/lib/python3.7/site-packages/umap/nndescent.py", line 47:
    @numba.njit(parallel=True)
    def nn_descent(
    ^

  self.func_ir.loc))

and when I turn on numba parallel diagnostics, that gives the report below. Any idea what is going on or if parallelized approximate nearest neighbors computations is supposed to be supported? Thanks!

 
================================================================================
 Parallel Accelerator Optimizing:  Function make_nn_descent.<locals>.nn_descent,
 /opt/miniconda3/envs/py37_2/lib/python3.7/site-packages/umap/nndescent.py (46)
  
================================================================================


Parallel loop listing for  Function make_nn_descent.<locals>.nn_descent, /opt/miniconda3/envs/py37_2/lib/python3.7/site-packages/umap/nndescent.py (46) 
------------------------------------------------------------------------------------------|loop #ID
    @numba.njit(parallel=True)                                                            | 
    def nn_descent(                                                                       | 
        data,                                                                             | 
        n_neighbors,                                                                      | 
        rng_state,                                                                        | 
        max_candidates=50,                                                                | 
        n_iters=10,                                                                       | 
        delta=0.001,                                                                      | 
        rho=0.5,                                                                          | 
        rp_tree_init=True,                                                                | 
        leaf_array=None,                                                                  | 
        verbose=False,                                                                    | 
    ):                                                                                    | 
        n_vertices = data.shape[0]                                                        | 
                                                                                          | 
        current_graph = make_heap(data.shape[0], n_neighbors)                             | 
        for i in range(data.shape[0]):                                                    | 
            indices = rejection_sample(n_neighbors, data.shape[0], rng_state)             | 
            for j in range(indices.shape[0]):                                             | 
                d = dist(data[i], data[indices[j]], *dist_args)                           | 
                heap_push(current_graph, i, d, indices[j], 1)                             | 
                heap_push(current_graph, indices[j], d, i, 1)                             | 
                                                                                          | 
        if rp_tree_init:                                                                  | 
            for n in range(leaf_array.shape[0]):                                          | 
                for i in range(leaf_array.shape[1]):                                      | 
                    if leaf_array[n, i] < 0:                                              | 
                        break                                                             | 
                    for j in range(i + 1, leaf_array.shape[1]):                           | 
                        if leaf_array[n, j] < 0:                                          | 
                            break                                                         | 
                        d = dist(                                                         | 
                            data[leaf_array[n, i]], data[leaf_array[n, j]], *dist_args    | 
                        )                                                                 | 
                        heap_push(                                                        | 
                            current_graph, leaf_array[n, i], d, leaf_array[n, j], 1       | 
                        )                                                                 | 
                        heap_push(                                                        | 
                            current_graph, leaf_array[n, j], d, leaf_array[n, i], 1       | 
                        )                                                                 | 
                                                                                          | 
        for n in range(n_iters):                                                          | 
            if verbose:                                                                   | 
                print("\t", n, " / ", n_iters)                                            | 
                                                                                          | 
            candidate_neighbors = build_candidates(                                       | 
                current_graph, n_vertices, n_neighbors, max_candidates, rng_state         | 
            )                                                                             | 
                                                                                          | 
            c = 0                                                                         | 
            for i in range(n_vertices):                                                   | 
                for j in range(max_candidates):                                           | 
                    p = int(candidate_neighbors[0, i, j])                                 | 
                    if p < 0 or tau_rand(rng_state) < rho:                                | 
                        continue                                                          | 
                    for k in range(max_candidates):                                       | 
                        q = int(candidate_neighbors[0, i, k])                             | 
                        if (                                                              | 
                            q < 0                                                         | 
                            or not candidate_neighbors[2, i, j]                           | 
                            and not candidate_neighbors[2, i, k]                          | 
                        ):                                                                | 
                            continue                                                      | 
                                                                                          | 
                        d = dist(data[p], data[q], *dist_args)                            | 
                        c += heap_push(current_graph, p, d, q, 1)                         | 
                        c += heap_push(current_graph, q, d, p, 1)                         | 
                                                                                          | 
            if c <= delta * n_neighbors * data.shape[0]:                                  | 
                break                                                                     | 
                                                                                          | 
        return deheap_sort(current_graph)                                                 | 
--------------------------------- Fusing loops ---------------------------------
Attempting fusion of parallel loops (combines loops with similar properties)...
----------------------------- Before Optimisation ------------------------------
--------------------------------------------------------------------------------
------------------------------ After Optimisation ------------------------------
Parallel structure is already optimal.
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
 
---------------------------Loop invariant code motion---------------------------

Instruction hoisting:
No instruction hoisting found
--------------------------------------------------------------------------------

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:13 (7 by maintainers)

github_iconTop GitHub Comments

2reactions
lmcinnescommented, Jun 20, 2020

If you just want multicore NN search then the current pynndescent will get you that. If you want distributed/cluster support for NN search then you may have to wait a while longer.

1reaction
lmcinnescommented, Nov 13, 2019

I am using Numba for the parallelisation, and that (currently; see https://github.com/numba/numba/issues/2713) does not allow dynamic handling on how many threads to use. You can, however, set the environment variable NUMBA_NUM_THREADS to restrict the thread pool size similar to OpenMP. This is one of the things that I would like to see improved before the parallel stuff becomes standard.

Read more comments on GitHub >

github_iconTop Results From Across the Web

UMAP API Guide — umap 0.5 documentation - Read the Docs
Running in parallel is non-deterministic, and is not used if a random seed has been set, to ensure reproducibility.
Read more >
Frequently Asked Questions — umap 0.5 documentation
For some datasets the default options for approximate nearest neighbor search can result in excessive memory use. If your dataset is not especially...
Read more >
umap.umap_ — umap 0.3 documentation - Read the Docs
This may be exact, but more likely is approximated via nearest neighbor descent. ... Running in parallel is non-deterministic, and is not used...
Read more >
umap.umap_ — umap 0.5 documentation - Read the Docs
f"This is not a problem as no vertices were disconnected. ... the k-neighbor graph of. n_neighbors: int The number of nearest neighbors to...
Read more >
Precomputed k-nn — umap 0.5 documentation - Read the Docs
Instead, we can compute the knn for the largest n_neighbors we wish to analyze and then feed that precomputed_knn to UMAP. UMAP will...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found