Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Systematically determine `min_dist` and `n_neighbors`

See original GitHub issue

I tested multiple combination of min_dist and n_neighbor on my data, I found that a suitable combination of those hyperparameters can separate the all the cluster(according to labels). I wonder how to determine those hyperparameters systematically, not by visualization of the data after embedding.

Issue Analytics

State:
Created 5 years ago
Comments:9 (3 by maintainers)

Top GitHub Comments

2reactions

lmcinnescommented, Jan 22, 2019

The n_neighbors parameter in UMAP and the n_samples parameter in DBSCAN/HDBSCAN mean different things. They are, at least, measured in the same units. There is some reason to believe one could construct a clustering algorithm that would make use of the UMAP n_neighbors in a similar way to DBSCAN/HDBSCAN. As it stands however, I wouldn’t say they should be tied or linked. I would say, however, that they should probably be on roughly the same scale (I would expect them to be the same order of magnitude in general, for instance).

1reaction

MattWenhamcommented, Feb 13, 2019

Start here for the best description of the algorithm: https://hal.archives-ouvertes.fr/hal-01461451

My implementation use somoclu as the SOM engine. It’s trapped in a Jupyter notebook at the moment, but I will try to get that shared at some point soon.

Top Results From Across the Web

Calculating MINDIST between a line segment and a rectangle ...

Figure 3: Calculating MINDIST between a line segment and a rectangle... ... Nearest Neighbor (NN) search has been in the core of spatial...

Calculate Distance Band from Neighbor Count (Spatial ...

The Calculate Distance Band from Neighbor Count tool returns three values: Minimum, Average, and Maximum N neighbor distance. The values are written as ......

rgbif/dataset_gridded.R at master · ropensci/rgbif · GitHub

This function uses the. #' percentage of unique lat-long points with the most common nearest. #' neighbor distance to identify gridded datasets.

Searching in High-dimensional Spaces Index Structures for ...

We will define in this section our notion of the database and we will develop a two-fold orthogonal classification for various neighborhood queries....

[PDF] The Min-dist Location Selection Query | Semantic Scholar

We call this problem the min-dist location selection problem, ... A basic algorithm based on dividing roads into sub-Intervals and finding the optimal ......