question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

-1 entries in neighbor_graph

See original GitHub issue

I get some -1 entries in the kNN graph when I build it on a sparse matrix with cosine distance. Is this intentional? Does -1 mean X.shape[0]-1? When I run query(X), I don’t get any negative elements. E.g.:

X = scipy.sparse.csr_matrix(np.random.randn(10000,1247)>3)
nn = pynndescent.NNDescent(X, metric='cosine', n_neighbors=15)

Now nn.neighbor_graph[0] has some values equal to -1 but nn.query(X, k=15)[0] does not.

Update: Forgot to say that I do get a warning “UserWarning: Failed to correctly find n_neighbors for some samples.Results may be less than ideal. Try re-running withdifferent parameters.” from NNDescent. Maybe that’s what -1 indicate? But then query() does not return any negative elements and does not complain. How should one approach this?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:11 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
dkobakcommented, May 12, 2020

For your information, I’ve been looking at this dataset: https://johnhw.github.io/umap_primes/index.md.html. Turns out, this (all neighbors -1) happens for around 10% of points (at least out of the first 100K numbers). It seems these are mostly prime numbers and UMAP mostly keeps them where they were at initialization.

0reactions
camiel-mcommented, Sep 3, 2020

Hi Leland, I ran into a similar issue with -1’s in the neigbor_graph.

When I run the snippet below I see that there are no -1’s in the graph.

X = scipy.sparse.csr_matrix(np.random.randn(10000,1247)>3)

nn = NNDescent(data=X, metric="cosine", n_neighbors=10)
indices, distances = nn.neighbor_graph
np.sum(nn.neighbor_graph[0]==-1)

output: 0

However, If I query the graph there are suddenly -1’s.

indices2, distances2 = nn.query(X)
np.sum(nn.neighbor_graph[0]==-1)

output: 62251

This seems odd since the underlying graph shouldn’t be affected by querying (new) data. The number of -1’s varies when using different metrics, but there’s always some.

I’m on pynndescent 0.48.1 and numba 0.49.1.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Chapter 8 Graphs: Definition, Applications, Representation
CHAPTER 8. GRAPHS: DEFINITION, APPLICATIONS, REPRESENTATION. Neighbors. A vertex u is a neighbor of (or equivalently adjacent to) a vertex v in a...
Read more >
Nearest Neighbor Graph - an overview | ScienceDirect Topics
Initially, a nearest neighbor graph G is constructed using X. G consists of N vertices where each vertex corresponds to an instance in...
Read more >
Directed graph node neighbors - Stack Overflow
The out-neighbors of a node N are all the nodes in the singly linked list belonging to that element N residing in the...
Read more >
Representing graphs (article) | Algorithms - Khan Academy
One is how long it takes to determine whether a given edge is in the graph. The other is how long it takes...
Read more >
Neighbors of graph node - MATLAB neighbors - MathWorks
Node identifier, specified as one of the values in this table. Value, Example. Scalar node index, 1. Character vector node name, ' ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found