T-SNE fails for CSR matrix
See original GitHub issueT-SNE fails for CSR matrix with:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().
Code to reproduce:
from sklearn.neighbors import BallTree, kneighbors_graph
from sklearn.manifold import TSNE
X = np.random.randn(100, 10)
bt = BallTree(X, leaf_size=300)
distances = kneighbors_graph(bt, n_neighbors=40, mode="distance", metric="cosine")
X_embedded = TSNE(n_components=2, metric="precomputed").fit_transform(distances)
Reason: When distance is square Compressed Sparse Row matrix then np.any(X > 0) is also sparse matrix.
<ipython-input-55-71728b7132f2> in <module>() ----> 1 X_embedded = TSNE(n_components=2, metric=“precomputed”).fit_transform(distances) 2 3 ax = plt.scatter(X_embedded[:,0], X_embedded[:,1], c=clusters[0:len(X_embedded)]).axes
/Users/roman/.virtualenvs/wordmap/lib/python2.7/site-packages/sklearn/manifold/t_sne.pyc in fit_transform(self, X, y) 857 Embedding of the training data in low-dimensional space. 858 “”" –> 859 embedding = self.fit(X) 860 self.embedding = embedding 861 return self.embedding_
/Users/roman/.virtualenvs/wordmap/lib/python2.7/site-packages/sklearn/manifold/t_sne.pyc in _fit(self, X, skip_num_points) 645 if X.shape[0] != X.shape[1]: 646 raise ValueError(“X should be a square distance matrix”) –> 647 if np.any(X < 0): 648 raise ValueError("All distances should be positive, the " 649 "precomputed distances given as X is not "
/Users/roman/.virtualenvs/wordmap/lib/python2.7/site-packages/scipy/sparse/base.pyc in bool(self) 236 return self.nnz != 0 237 else: –> 238 raise ValueError("The truth value of an array with more than one " 239 “element is ambiguous. Use a.any() or a.all().”) 240 nonzero = bool
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().
Issue Analytics
- State:
- Created 6 years ago
- Comments:20 (17 by maintainers)
Top GitHub Comments
Aww makes sense. I can reproduce this, thanks for the code snippet! As you say the root cause is that
np.any(distance > 0)
evaluates to a sparse array instead of a boolean. Apparently, it’s a known issue with sparse CSR, not that I was aware it could do that…This line was recently introduced in https://github.com/scikit-learn/scikit-learn/pull/9032 but question is why the test
sklearn/manifold/tests/test_t_sne.py::test_fit_csr_matrix
doesn’t fail.ping @tomMoral @jnothman sounds like a regression in 0.19…
Fixed in #10482