UMAP segmentation fault
See original GitHub issueumap.UMAP(n_neighbors=11, metric='jaccard',
min_dist=0.5).fit(X)
Error - segmentation fault
umap-learn 0.5.1 - Bad umap-learn 0.5.0 - Bad umap-learn 0.4.6 - Good
But if you increase it n_neighbors
to 500, then everything works on new versions.
Issue Analytics
- State:
- Created 2 years ago
- Comments:8 (5 by maintainers)
Top Results From Across the Web
UMAP Segmentation Faults · Issue #747 - GitHub
The weird thing is that the seg fault does not occur if I just run UMAP inside of a python terminal, it only...
Read more >Frequently Asked Questions — umap 0.5 documentation
One way UMAP can go wrong is the introduction of data points that are maximally far apart from all other points in your...
Read more >numba/numba - Gitter
Hi, I want to use carray in the numba cuda jit, but it doesn't seem to be available. Is there similar functionality I...
Read more >RunUMAP gives segmentation fault - Stack Overflow
I tried the following > my.exp <- RunUMAP(my.exp, dims = 1:30) UMAP(a=None, angular_rp_forest=False, b=None, init='spectral', ...
Read more >FAQ - BERTopic
The main culprit here seems to be UMAP. After running tests with Tuna we can see that most of the resources when importing...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
So this was actually quite complicated because it is a subtle issue. In the end it really comes down to the nature of your data and the construction of random projection trees to initialize a nearest neighbour search combining with floating point error accumulation that ends up causing bad things to happen (tree splits not occurring properly). A long term fix for that is a little harder, but there is a very simple short term fix:
Since you are using Jaccard distance, which only cares about whether entries are non-zero or not, you can simply pass in a binarized version of your data and everything should work.
Any angular metric will have similar issues. Your data is very skewed. Try using
StandardScaler
or similar on your data first to at least get the different columns in the same value ranges.