question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Systematically determine `min_dist` and `n_neighbors`

See original GitHub issue

I tested multiple combination of min_dist and n_neighbor on my data, I found that a suitable combination of those hyperparameters can separate the all the cluster(according to labels). I wonder how to determine those hyperparameters systematically, not by visualization of the data after embedding.

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:9 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
lmcinnescommented, Jan 22, 2019

The n_neighbors parameter in UMAP and the n_samples parameter in DBSCAN/HDBSCAN mean different things. They are, at least, measured in the same units. There is some reason to believe one could construct a clustering algorithm that would make use of the UMAP n_neighbors in a similar way to DBSCAN/HDBSCAN. As it stands however, I wouldn’t say they should be tied or linked. I would say, however, that they should probably be on roughly the same scale (I would expect them to be the same order of magnitude in general, for instance).

1reaction
MattWenhamcommented, Feb 13, 2019

Start here for the best description of the algorithm: https://hal.archives-ouvertes.fr/hal-01461451

My implementation use somoclu as the SOM engine. It’s trapped in a Jupyter notebook at the moment, but I will try to get that shared at some point soon.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Calculating MINDIST between a line segment and a rectangle ...
Figure 3: Calculating MINDIST between a line segment and a rectangle... ... Nearest Neighbor (NN) search has been in the core of spatial...
Read more >
Calculate Distance Band from Neighbor Count (Spatial ...
The Calculate Distance Band from Neighbor Count tool returns three values: Minimum, Average, and Maximum N neighbor distance. The values are written as ......
Read more >
rgbif/dataset_gridded.R at master · ropensci/rgbif · GitHub
This function uses the. #' percentage of unique lat-long points with the most common nearest. #' neighbor distance to identify gridded datasets.
Read more >
Searching in High-dimensional Spaces Index Structures for ...
We will define in this section our notion of the database and we will develop a two-fold orthogonal classification for various neighborhood queries....
Read more >
[PDF] The Min-dist Location Selection Query | Semantic Scholar
We call this problem the min-dist location selection problem, ... A basic algorithm based on dividing roads into sub-Intervals and finding the optimal ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found