Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

update function of UMAP does not work

See original GitHub issue

I’m trying to build an incremental trainer for umap, updating on batches of data. I’m testing this out with mnist.

import numpy as np
import sklearn.datasets
import umap
import umap.utils as utils
import umap.aligned_umap
from sklearn.datasets import fetch_openml
from sklearn.decomposition import PCA

mnist = fetch_openml('mnist_784', version=1)
mnist.target = mnist.target.astype(int)

first, second = mnist.data[:50000], mnist.data[50000:]
print(first.shape, second.shape)

standard_embedding = umap.UMAP(random_state=42).fit(first)
standard_embedding.update(second)

on update I see

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/var/folders/3d/d0dl2ykn6c18qg7kg_j7tplm0000gn/T/ipykernel_98177/3602609767.py in <module>
----> 1 standard_embedding.update(second)

~/.pyenv/versions/3.9.6/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/umap/umap_.py in update(self, X)
   3129 
   3130         else:
-> 3131             self._knn_search_index.update(X)
   3132             self._raw_data = self._knn_search_index._raw_data
   3133             (

~/.pyenv/versions/3.9.6/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pynndescent/pynndescent_.py in update(self, X)
   1611         X = check_array(X, dtype=np.float32, accept_sparse="csr", order="C")
   1612 
-> 1613         original_order = np.argsort(self._vertex_order)
   1614 
   1615         if self._is_sparse:

AttributeError: 'NNDescent' object has no attribute '_vertex_order'

Is this expected behavior? Am I using UMAP improperly here? I see an example of aligned_umap but I was hoping to use the standard umap as I do not have relations

Issue Analytics

State:
Created 2 years ago
Comments:7 (2 by maintainers)

Top GitHub Comments

2reactions

ThomasNickersoncommented, Oct 21, 2021

I actually ran into this problem yesterday and have a fix ready to go @lmcinnes, will open a PR. It’s only an issue for the n>4096 path in update.

0reactions

preet2312commented, Dec 16, 2022

Hey @lmcinnes & @vedrocks15 I am working on something similar and got the same error of divide by zero.
My dataset has more than 4M rows and 384 dimensions. While trying to reduce the dimension to 50, my 32 Gb RAM system doesn’t take all of the 4M rows at once and I had to go with Batch processing. I am trying to fit the small chunks of data to UMAP and in the process of doing that, update doesn’t seem to help much.

First of the small chunk: xvs[:10000].shape => (10000, 384)

model1 = umap.UMAP(
            n_neighbors=30,
            min_dist=0.0,
            n_components=50,
            random_state=42,
            ).fit(xvs[:10000])

model1.embedding_.shape => (10000, 50)

model1.update(xvs[10000:20000]) gives the following error

ZeroDivisionError                         Traceback (most recent call last)
Input In [13], in <cell line: 1>()
----> 1 model1.update(xvs[10000:20000])

File ~\anaconda3\envs\py39\lib\site-packages\umap\umap_.py:3348, in UMAP.update(self, X)
   3344 init = np.zeros(
   3345     (self._raw_data.shape[0], self.n_components), dtype=np.float32
   3346 )
   3347 init[:original_size] = self.embedding_
-> 3348 init_update(init, original_size, self._knn_indices)
   3350 if self.n_epochs is None:
   3351     n_epochs = 0

ZeroDivisionError: division by zero

But when I re-run the same update code, I get different error this time.

ValueError                                Traceback (most recent call last)
Input In [14], in <cell line: 1>()
----> 1 model1.update(xvs[10000:20000])

File ~\anaconda3\envs\py39\lib\site-packages\umap\umap_.py:3347, in UMAP.update(self, X)
   3329 self.graph_, self._sigmas, self._rhos = fuzzy_simplicial_set(
   3330     self._raw_data,
   3331     self.n_neighbors,
   (...)
   3341     self.verbose,
   3342 )
   3344 init = np.zeros(
   3345     (self._raw_data.shape[0], self.n_components), dtype=np.float32
   3346 )
-> 3347 init[:original_size] = self.embedding_
   3348 init_update(init, original_size, self._knn_indices)
   3350 if self.n_epochs is None:

ValueError: could not broadcast input array from shape (10000,50) into shape (20000,50)

Not sure how to approach this problem and if there is any better solution for the batch processing in UMAP as I just need to fit the chunks of data and I need model.embedding_ at the end to follow the next steps.

Thank you.!

Top Results From Across the Web

How to resolve the error, "module umap has no attribute ...

1 - Solving on your machine by updating the library via git ... I changed the name to umap_application.py and the problem was...

Frequently Asked Questions — umap 0.5 documentation

Compiled here are a set of frequently asked questions, along with answers. If you don't find your question listed here then please feel...

How to Use UMAP — umap 0.5 documentation

UMAP is a general purpose manifold learning and dimension reduction algorithm. It is designed to be compatible with scikit-learn, making use of the...

UMAP API Guide — umap 0.5 documentation - Read the Docs

Perform a fuzzy simplicial set embedding, using a specified initialisation method and then minimizing the fuzzy set cross entropy between the 1-skeletons of ......

Transforming New Data with UMAP - Read the Docs

This works exactly as in the How to Use UMAP example using the fit method. In this case we simply hand it the...