Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

why use np.max instead of np.min in smooth_knn_dist

See original GitHub issue

thought it should be nearest neighbor which has mininum distance?

https://github.com/lmcinnes/umap/blob/master/umap/umap_.py#L179

rho[i] = np.max(non_zero_dists)

Issue Analytics

State:
Created 3 years ago
Comments:8 (2 by maintainers)

Top GitHub Comments

2reactions

jlmelvillecommented, Apr 23, 2020

The way it’s currently done in UMAP is the symmetrized normalized Laplacian (L_sym in the notation used by von Luxburg in that tutorial). For that, you need the dim + 1 smallest eigenvectors, ignoring the smallest eigenvector.

You could instead use the random walk transition matrix, P, in which case you would want the dim + 1 largest eigenvectors, ignoring the top eigenvector. Those are the equivalent to the dim + 1 bottom eigenvectors of the random walk Laplacian, L_rw (once again ignoring the bottom eigenvector). That’s basically the same procedure as Laplacian Eigenmaps.

In the tutorial, on various theoretical grounds, von Luxburg suggests that L_rw is superior to L_sym for spectral clustering. But I’ve not noticed that one is superior to the other for the purposes of initializing UMAP.

1reaction

lmcinnescommented, Apr 23, 2020

It depends on the choice of Laplacian (there are several) and a few other things. I got some advice from some experts in the area and went with that. I did recently get some alternative approaches suggested from another expert in spectral methods on graphs, but haven’t had time to explore them yet.