question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

umap crashes in my computer with 900,000 points

See original GitHub issue

Hi, I have been trying to embed 900,000 points using UMAP in my computer. The program eventually gets killed by the system. I tried running in both Jupyter and in terminal.

My system: 16Core/32Thread AMD CPU, 128GB RAM (Terminal reports 125GB). Ubuntu 18.04.3 LTS.

I was wondering if it is a system requirement issue or an issue in how the UMAP handles this many points. (In the paper, it seems UMAP can handle millions of points as there is a visualization of 3Million points.)

Here is a code that reproduces the error in my computer:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA


X_main = np.random.rand(900000, 1000)

n_components = 2

pca = PCA(n_components = 50)
X_train = pca.fit_transform(X_main)


n_neighbors= 50
MIN_DIST = 0.1

import umap

ump = umap.UMAP(n_neighbors=n_neighbors,
        min_dist=MIN_DIST,
        n_components=2,
        random_state=100,
        metric= 'euclidean')

y_umap = ump.fit_transform(X_train)

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:1
  • Comments:8 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
lmcinnescommented, Oct 1, 2020

The most likely reason for a silent crash with the system killing the job is a memory issue. UMAP can be pretty memory hungry (newer development versions are working to fix this). At least one option is the try the option low_memory=True which will try to use a sometimes slower but less memory hungry approach. Another option is to install the latest (version 0.5 or newer) version of pynndescent.

0reactions
vb690commented, Sep 21, 2021

Thank you for your answers! I have a couple of new insights on this:

  1. Running multiple consecutive instances of UMAP doesn’t seem to be the problem (I can easily reduce many large datasets one after the other without running into memory problems).
  2. Changing the distance metric from “cosine” to “euclidean” did solve the silent crash problem (I assume because is less expensive?).
Read more comments on GitHub >

github_iconTop Results From Across the Web

Editor crashes on umap open - Unreal Engine Forums
I have no idea what's happening. I was finishing a level with large terrain, some foliage and used the smooth tool quite a...
Read more >
Joint Committee Print 106-61 - Congress.gov
U.S. officials frequently made the point that the use of such ``filters'' to ... The number of Feles Mora in the country has...
Read more >
424B4 - SEC.gov
This is the initial public offering of shares of Class A common stock of Recursion Pharmaceuticals, Inc. We are offering 24,242,424 shares of...
Read more >
TME Volume 7, Numbers 2 and 3
computer science and physics have reported feeling isolated or alienated in ... Once the coding scheme reached a point at which it seemed...
Read more >
An Epidemiologic Approach to Reproductive Health
check each record as it is entered into the computer (this may be ... Measures of Disease Frequency in Reproductive Health. Point prevalence...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found