Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ValueError in transform method after fitting on > 4095 samples

See original GitHub issue

After fitting UMAP with a dataset with more than 4095 samples, if I then use the transform() method on a different set of data an error occurs.

How to reproduce the error:

import numpy as np
import umap

reducer = umap.UMAP()
reducer.fit(np.random.rand(4096, 4000))
reducer.transform(np.random.rand(500, 4000))

Meanwhile, if we fit the model with 4095 samples or less everything works just fine

reducer = umap.UMAP()
reducer.fit(np.random.rand(4095, 4000))
reducer.transform(np.random.rand(500, 4000))

Note: I have pynndescent==0.4.7 installed and umap-learn==0.4.5.

Traceback

ValueError Traceback (most recent call last) <ipython-input-9-3596db82cde5> in <module> ----> 1 transformed_data = reducer.transform(np.random.rand(500, 4000))

~/.local/share/virtualenvs/Test-yifRmGUs/lib/python3.7/site-packages/umap/umap_.py in transform(self, X) 2069 dists = submatrix(dmat_shortened, indices_sorted, self._n_neighbors) 2070 elif _HAVE_PYNNDESCENT: -> 2071 indices, dists = self._rp_forest.query(X, self.n_neighbors) 2072 elif self._sparse_data: 2073 if not scipy.sparse.issparse(X):

~/.local/share/virtualenvs/Test-yifRmGUs/lib/python3.7/site-packages/pynndescent/pynndescent_.py in query(self, query_data, k, epsilon) 1216 # query_data = check_array(query_data, dtype=np.float64, order=‘C’) 1217 query_data = np.asarray(query_data).astype(np.float32, order=“C”) -> 1218 self._init_search_graph() 1219 result = search( 1220 query_data,

~/.local/share/virtualenvs/Test-yifRmGUs/lib/python3.7/site-packages/pynndescent/pynndescent_.py in _init_search_graph(self) 1065 1066 # Get rid of any -1 index entries -> 1067 self._search_graph = self._search_graph.tocsr() 1068 self._search_graph.data[self._search_graph.indices == -1] = 0.0 1069 self._search_graph.eliminate_zeros()

~/.local/share/virtualenvs/Test-yifRmGUs/lib/python3.7/site-packages/scipy/sparse/lil.py in tocsr(self, copy) 460 indptr = np.empty(M + 1, dtype=idx_dtype) 461 indptr[0] = 0 –> 462 _csparsetools.lil_get_lengths(self.rows, indptr[1:]) 463 np.cumsum(indptr, out=indptr) 464 nnz = indptr[-1]

_csparsetools.pyx in scipy.sparse._csparsetools.lil_get_lengths()

ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

Relevant or not but if I run the transform() method again I get a different error:

Traceback

AttributeError Traceback (most recent call last) <ipython-input-11-3596db82cde5> in <module> ----> 1 transformed_data = reducer.transform(np.random.rand(500, 4000))

~/.local/share/virtualenvs/Test-yifRmGUs/lib/python3.7/site-packages/pynndescent/pynndescent_.py in query(self, query_data, k, epsilon) 1221 k, 1222 self._raw_data, -> 1223 self._search_forest, 1224 self._search_graph.indptr, 1225 self._search_graph.indices,

AttributeError: ‘NNDescent’ object has no attribute ‘_search_forest’

Issue Analytics

State:
Created 3 years ago
Comments:7 (3 by maintainers)

Top GitHub Comments

1reaction

lmcinnescommented, Jul 1, 2020

There was a change in the internals of scipy sparse lil_matrix handling in the latest scipy release that broke some things. I have been working to catch all the issues for the last week or so. There is now a umap-learn 0.4.6 and pynndescent 0.4.8 that will hopefully resolve these issues. If they don’t, let me know, as there are probably still a few uncaught cases here somewhere.

0reactions

100rab-Scommented, Sep 28, 2021

The error was gone when I updated all the libraries in my environment. I don’t know which library was giving the issue. But before updating all the libraries, I tried by updating only the scipy, pynndescent, but the error was still there.