ValueError in transform method after fitting on > 4095 samples
See original GitHub issueAfter fitting UMAP with a dataset with more than 4095 samples, if I then use the transform() method on a different set of data an error occurs.
How to reproduce the error:
import numpy as np
import umap
reducer = umap.UMAP()
reducer.fit(np.random.rand(4096, 4000))
reducer.transform(np.random.rand(500, 4000))
Meanwhile, if we fit the model with 4095 samples or less everything works just fine
reducer = umap.UMAP()
reducer.fit(np.random.rand(4095, 4000))
reducer.transform(np.random.rand(500, 4000))
Note: I have pynndescent==0.4.7 installed and umap-learn==0.4.5.
Traceback
ValueError Traceback (most recent call last) <ipython-input-9-3596db82cde5> in <module> ----> 1 transformed_data = reducer.transform(np.random.rand(500, 4000))
~/.local/share/virtualenvs/Test-yifRmGUs/lib/python3.7/site-packages/umap/umap_.py in transform(self, X) 2069 dists = submatrix(dmat_shortened, indices_sorted, self._n_neighbors) 2070 elif _HAVE_PYNNDESCENT: -> 2071 indices, dists = self._rp_forest.query(X, self.n_neighbors) 2072 elif self._sparse_data: 2073 if not scipy.sparse.issparse(X):
~/.local/share/virtualenvs/Test-yifRmGUs/lib/python3.7/site-packages/pynndescent/pynndescent_.py in query(self, query_data, k, epsilon) 1216 # query_data = check_array(query_data, dtype=np.float64, order=‘C’) 1217 query_data = np.asarray(query_data).astype(np.float32, order=“C”) -> 1218 self._init_search_graph() 1219 result = search( 1220 query_data,
~/.local/share/virtualenvs/Test-yifRmGUs/lib/python3.7/site-packages/pynndescent/pynndescent_.py in _init_search_graph(self) 1065 1066 # Get rid of any -1 index entries -> 1067 self._search_graph = self._search_graph.tocsr() 1068 self._search_graph.data[self._search_graph.indices == -1] = 0.0 1069 self._search_graph.eliminate_zeros()
~/.local/share/virtualenvs/Test-yifRmGUs/lib/python3.7/site-packages/scipy/sparse/lil.py in tocsr(self, copy) 460 indptr = np.empty(M + 1, dtype=idx_dtype) 461 indptr[0] = 0 –> 462 _csparsetools.lil_get_lengths(self.rows, indptr[1:]) 463 np.cumsum(indptr, out=indptr) 464 nnz = indptr[-1]
_csparsetools.pyx in scipy.sparse._csparsetools.lil_get_lengths()
ValueError: Buffer has wrong number of dimensions (expected 1, got 2)
Relevant or not but if I run the transform() method again I get a different error:
Traceback
AttributeError Traceback (most recent call last) <ipython-input-11-3596db82cde5> in <module> ----> 1 transformed_data = reducer.transform(np.random.rand(500, 4000))
~/.local/share/virtualenvs/Test-yifRmGUs/lib/python3.7/site-packages/umap/umap_.py in transform(self, X) 2069 dists = submatrix(dmat_shortened, indices_sorted, self._n_neighbors) 2070 elif _HAVE_PYNNDESCENT: -> 2071 indices, dists = self._rp_forest.query(X, self.n_neighbors) 2072 elif self._sparse_data: 2073 if not scipy.sparse.issparse(X):
~/.local/share/virtualenvs/Test-yifRmGUs/lib/python3.7/site-packages/pynndescent/pynndescent_.py in query(self, query_data, k, epsilon) 1221 k, 1222 self._raw_data, -> 1223 self._search_forest, 1224 self._search_graph.indptr, 1225 self._search_graph.indices,
AttributeError: ‘NNDescent’ object has no attribute ‘_search_forest’
Issue Analytics
- State:
- Created 3 years ago
- Comments:7 (3 by maintainers)
Top GitHub Comments
There was a change in the internals of scipy sparse lil_matrix handling in the latest scipy release that broke some things. I have been working to catch all the issues for the last week or so. There is now a umap-learn 0.4.6 and pynndescent 0.4.8 that will hopefully resolve these issues. If they don’t, let me know, as there are probably still a few uncaught cases here somewhere.
The error was gone when I updated all the libraries in my environment. I don’t know which library was giving the issue. But before updating all the libraries, I tried by updating only the scipy, pynndescent, but the error was still there.