question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

UMAP segmentation fault

See original GitHub issue
umap.UMAP(n_neighbors=11, metric='jaccard', 
                      min_dist=0.5).fit(X)

Error - segmentation fault

umap-learn 0.5.1 - Bad umap-learn 0.5.0 - Bad umap-learn 0.4.6 - Good

But if you increase it n_neighbors to 500, then everything works on new versions.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:8 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
lmcinnescommented, Jul 26, 2021

So this was actually quite complicated because it is a subtle issue. In the end it really comes down to the nature of your data and the construction of random projection trees to initialize a nearest neighbour search combining with floating point error accumulation that ends up causing bad things to happen (tree splits not occurring properly). A long term fix for that is a little harder, but there is a very simple short term fix:

Since you are using Jaccard distance, which only cares about whether entries are non-zero or not, you can simply pass in a binarized version of your data and everything should work.

umap.UMAP(n_neighbors=11, metric='jaccard', 
                      min_dist=0.5).fit(X != 0)
0reactions
lmcinnescommented, Jul 27, 2021

Any angular metric will have similar issues. Your data is very skewed. Try using StandardScaler or similar on your data first to at least get the different columns in the same value ranges.

Read more comments on GitHub >

github_iconTop Results From Across the Web

UMAP Segmentation Faults · Issue #747 - GitHub
The weird thing is that the seg fault does not occur if I just run UMAP inside of a python terminal, it only...
Read more >
Frequently Asked Questions — umap 0.5 documentation
One way UMAP can go wrong is the introduction of data points that are maximally far apart from all other points in your...
Read more >
numba/numba - Gitter
Hi, I want to use carray in the numba cuda jit, but it doesn't seem to be available. Is there similar functionality I...
Read more >
RunUMAP gives segmentation fault - Stack Overflow
I tried the following > my.exp <- RunUMAP(my.exp, dims = 1:30) UMAP(a=None, angular_rp_forest=False, b=None, init='spectral', ...
Read more >
FAQ - BERTopic
The main culprit here seems to be UMAP. After running tests with Tuna we can see that most of the resources when importing...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found