update function of UMAP does not work
See original GitHub issueI’m trying to build an incremental trainer for umap, updating on batches of data. I’m testing this out with mnist.
import numpy as np
import sklearn.datasets
import umap
import umap.utils as utils
import umap.aligned_umap
from sklearn.datasets import fetch_openml
from sklearn.decomposition import PCA
mnist = fetch_openml('mnist_784', version=1)
mnist.target = mnist.target.astype(int)
first, second = mnist.data[:50000], mnist.data[50000:]
print(first.shape, second.shape)
standard_embedding = umap.UMAP(random_state=42).fit(first)
standard_embedding.update(second)
on update
I see
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/var/folders/3d/d0dl2ykn6c18qg7kg_j7tplm0000gn/T/ipykernel_98177/3602609767.py in <module>
----> 1 standard_embedding.update(second)
~/.pyenv/versions/3.9.6/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/umap/umap_.py in update(self, X)
3129
3130 else:
-> 3131 self._knn_search_index.update(X)
3132 self._raw_data = self._knn_search_index._raw_data
3133 (
~/.pyenv/versions/3.9.6/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pynndescent/pynndescent_.py in update(self, X)
1611 X = check_array(X, dtype=np.float32, accept_sparse="csr", order="C")
1612
-> 1613 original_order = np.argsort(self._vertex_order)
1614
1615 if self._is_sparse:
AttributeError: 'NNDescent' object has no attribute '_vertex_order'
Is this expected behavior? Am I using UMAP improperly here? I see an example of aligned_umap but I was hoping to use the standard umap as I do not have relations
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (2 by maintainers)
Top Results From Across the Web
How to resolve the error, "module umap has no attribute ...
1 - Solving on your machine by updating the library via git ... I changed the name to umap_application.py and the problem was...
Read more >Frequently Asked Questions — umap 0.5 documentation
Compiled here are a set of frequently asked questions, along with answers. If you don't find your question listed here then please feel...
Read more >How to Use UMAP — umap 0.5 documentation
UMAP is a general purpose manifold learning and dimension reduction algorithm. It is designed to be compatible with scikit-learn, making use of the...
Read more >UMAP API Guide — umap 0.5 documentation - Read the Docs
Perform a fuzzy simplicial set embedding, using a specified initialisation method and then minimizing the fuzzy set cross entropy between the 1-skeletons of ......
Read more >Transforming New Data with UMAP - Read the Docs
This works exactly as in the How to Use UMAP example using the fit method. In this case we simply hand it the...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I actually ran into this problem yesterday and have a fix ready to go @lmcinnes, will open a PR. It’s only an issue for the
n>4096
path inupdate
.Hey @lmcinnes & @vedrocks15 I am working on something similar and got the same error of divide by zero.
My dataset has more than 4M rows and 384 dimensions. While trying to reduce the dimension to 50, my 32 Gb RAM system doesn’t take all of the 4M rows at once and I had to go with Batch processing. I am trying to fit the small chunks of data to UMAP and in the process of doing that,
update
doesn’t seem to help much.First of the small chunk:
xvs[:10000].shape => (10000, 384)
model1.update(xvs[10000:20000]) gives the following error
But when I re-run the same
update
code, I get different error this time.Not sure how to approach this problem and if there is any better solution for the batch processing in UMAP as I just need to fit the chunks of data and I need
model.embedding_
at the end to follow the next steps.Thank you.!