Systematically determine `min_dist` and `n_neighbors`
See original GitHub issueI tested multiple combination of min_dist
and n_neighbor
on my data, I found that a suitable combination of those hyperparameters can separate the all the cluster(according to labels). I wonder how to determine those hyperparameters systematically, not by visualization of the data after embedding.
Issue Analytics
- State:
- Created 5 years ago
- Comments:9 (3 by maintainers)
Top Results From Across the Web
Calculating MINDIST between a line segment and a rectangle ...
Figure 3: Calculating MINDIST between a line segment and a rectangle... ... Nearest Neighbor (NN) search has been in the core of spatial...
Read more >Calculate Distance Band from Neighbor Count (Spatial ...
The Calculate Distance Band from Neighbor Count tool returns three values: Minimum, Average, and Maximum N neighbor distance. The values are written as ......
Read more >rgbif/dataset_gridded.R at master · ropensci/rgbif · GitHub
This function uses the. #' percentage of unique lat-long points with the most common nearest. #' neighbor distance to identify gridded datasets.
Read more >Searching in High-dimensional Spaces Index Structures for ...
We will define in this section our notion of the database and we will develop a two-fold orthogonal classification for various neighborhood queries....
Read more >[PDF] The Min-dist Location Selection Query | Semantic Scholar
We call this problem the min-dist location selection problem, ... A basic algorithm based on dividing roads into sub-Intervals and finding the optimal ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
The
n_neighbors
parameter in UMAP and then_samples
parameter in DBSCAN/HDBSCAN mean different things. They are, at least, measured in the same units. There is some reason to believe one could construct a clustering algorithm that would make use of the UMAPn_neighbors
in a similar way to DBSCAN/HDBSCAN. As it stands however, I wouldn’t say they should be tied or linked. I would say, however, that they should probably be on roughly the same scale (I would expect them to be the same order of magnitude in general, for instance).Start here for the best description of the algorithm: https://hal.archives-ouvertes.fr/hal-01461451
My implementation use
somoclu
as the SOM engine. It’s trapped in a Jupyter notebook at the moment, but I will try to get that shared at some point soon.