question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ValueError in transform method after fitting on > 4095 samples

See original GitHub issue

After fitting UMAP with a dataset with more than 4095 samples, if I then use the transform() method on a different set of data an error occurs.

How to reproduce the error:

import numpy as np
import umap

reducer = umap.UMAP()
reducer.fit(np.random.rand(4096, 4000))
reducer.transform(np.random.rand(500, 4000))

Meanwhile, if we fit the model with 4095 samples or less everything works just fine

reducer = umap.UMAP()
reducer.fit(np.random.rand(4095, 4000))
reducer.transform(np.random.rand(500, 4000))

Note: I have pynndescent==0.4.7 installed and umap-learn==0.4.5.

Traceback


ValueError Traceback (most recent call last) <ipython-input-9-3596db82cde5> in <module> ----> 1 transformed_data = reducer.transform(np.random.rand(500, 4000))

~/.local/share/virtualenvs/Test-yifRmGUs/lib/python3.7/site-packages/umap/umap_.py in transform(self, X) 2069 dists = submatrix(dmat_shortened, indices_sorted, self._n_neighbors) 2070 elif _HAVE_PYNNDESCENT: -> 2071 indices, dists = self._rp_forest.query(X, self.n_neighbors) 2072 elif self._sparse_data: 2073 if not scipy.sparse.issparse(X):

~/.local/share/virtualenvs/Test-yifRmGUs/lib/python3.7/site-packages/pynndescent/pynndescent_.py in query(self, query_data, k, epsilon) 1216 # query_data = check_array(query_data, dtype=np.float64, order=‘C’) 1217 query_data = np.asarray(query_data).astype(np.float32, order=“C”) -> 1218 self._init_search_graph() 1219 result = search( 1220 query_data,

~/.local/share/virtualenvs/Test-yifRmGUs/lib/python3.7/site-packages/pynndescent/pynndescent_.py in _init_search_graph(self) 1065 1066 # Get rid of any -1 index entries -> 1067 self._search_graph = self._search_graph.tocsr() 1068 self._search_graph.data[self._search_graph.indices == -1] = 0.0 1069 self._search_graph.eliminate_zeros()

~/.local/share/virtualenvs/Test-yifRmGUs/lib/python3.7/site-packages/scipy/sparse/lil.py in tocsr(self, copy) 460 indptr = np.empty(M + 1, dtype=idx_dtype) 461 indptr[0] = 0 –> 462 _csparsetools.lil_get_lengths(self.rows, indptr[1:]) 463 np.cumsum(indptr, out=indptr) 464 nnz = indptr[-1]

_csparsetools.pyx in scipy.sparse._csparsetools.lil_get_lengths()

ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

Relevant or not but if I run the transform() method again I get a different error:

Traceback


AttributeError Traceback (most recent call last) <ipython-input-11-3596db82cde5> in <module> ----> 1 transformed_data = reducer.transform(np.random.rand(500, 4000))

~/.local/share/virtualenvs/Test-yifRmGUs/lib/python3.7/site-packages/umap/umap_.py in transform(self, X) 2069 dists = submatrix(dmat_shortened, indices_sorted, self._n_neighbors) 2070 elif _HAVE_PYNNDESCENT: -> 2071 indices, dists = self._rp_forest.query(X, self.n_neighbors) 2072 elif self._sparse_data: 2073 if not scipy.sparse.issparse(X):

~/.local/share/virtualenvs/Test-yifRmGUs/lib/python3.7/site-packages/pynndescent/pynndescent_.py in query(self, query_data, k, epsilon) 1221 k, 1222 self._raw_data, -> 1223 self._search_forest, 1224 self._search_graph.indptr, 1225 self._search_graph.indices,

AttributeError: ‘NNDescent’ object has no attribute ‘_search_forest’

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
lmcinnescommented, Jul 1, 2020

There was a change in the internals of scipy sparse lil_matrix handling in the latest scipy release that broke some things. I have been working to catch all the issues for the last week or so. There is now a umap-learn 0.4.6 and pynndescent 0.4.8 that will hopefully resolve these issues. If they don’t, let me know, as there are probably still a few uncaught cases here somewhere.

0reactions
100rab-Scommented, Sep 28, 2021

The error was gone when I updated all the libraries in my environment. I don’t know which library was giving the issue. But before updating all the libraries, I tried by updating only the scipy, pynndescent, but the error was still there.

Read more comments on GitHub >

github_iconTop Results From Across the Web

fit and transform error on Cross validation and test data
The transform function expects a 2D array as (samples, features). The error indicates that second dimension of X_train['price'] and x_cv['price'] ...
Read more >
Matplotlib.pdf
matplotlib is a library for making 2D plots of arrays in Python. Although it has its origins in emulating the. MATLAB® 1 graphics...
Read more >
Sklearn ValueError: X has 2 features per sample; expecting 11
I know that the error is that the model has been trained using 11 functions, but it is envisaged to use 2 functions,...
Read more >
CircuitPython Documentation
CircuitPython is a beginner friendly, open source version of Python for tiny, inexpensive computers called microcon- trollers.
Read more >
powerful Python data analysis toolkit - pandas
Installing using your Linux distribution's package manager. ... The following methods and options are added to Index, to be more consistent ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found