Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Feature request: Parallel/Multicore implementation of t-SNE

See original GitHub issue

Has there been any discussion on implementing a multicore version of t-SNE in sklearn?

The fastest version that I have seen/used is https://github.com/claczny/VizBin/tree/master/src/backend/bh_tsne .

I think a few simple addition would be to change:

line 160 of https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/manifold/t_sne.py from:

pdist(X_embedded, "sqeuclidean") to the following: sklearn.metrics.pairwise to use the n_jobs parameter

In regards to the actual algorithm, https://github.com/DmitryUlyanov/Multicore-TSNE has implemented this in a different programming language so I cannot see what has been done. I believe his method only works for 2D embeddings (which could be fine if noted) and is very fast.

I also had very basic (and potentially naive) ideas in using bayesian optimization to speed up the algorithm as well if anyone has any insight: https://www.reddit.com/r/MachineLearning/comments/78i9rh/discussion_bayesian_optimization_of_tsne/ but this may not be the right place for that.

Just trying to think of ways to use this on larger datasets.

Issue Analytics

State:
Created 6 years ago
Comments:14 (11 by maintainers)

Top GitHub Comments

10reactions

lmcinnescommented, Feb 28, 2018

I believe parallelising the neighbour computation (be it via balltrees, or full distance computation) would be relatively straightforward. There are, of course, seriously diminishing returns on that for much more than 4 or 8 cores. Paralellising the gradient descent is rather harder, but I think there are even less gains to be had there. I would be willing to take a look at this at some point if people are still interested.

1reaction

cmarmocommented, Jan 29, 2021

neighbors computation has been multithreaded in #15082 and the gradient computation has been parallelized in #13264. Is this issue still relevant? Thanks.

Top Results From Across the Web

openTSNE: Extensible, parallel implementations of t-SNE

The library is designed to be extensible and it is easy to implement and use your own components and makes experimentation very simple....

sklearn.manifold.TSNE — scikit-learn 1.2.0 documentation

t-SNE [1] is a tool to visualize high-dimensional data. It converts similarities between data points to joint probabilities and tries to minimize the ......

T-distributed Stochastic Neighbor Embedding(t-SNE)

What if you have hundreds of features or data points in a dataset, and you want to represent them in a 2-dimensional or...

t-SNE Algorithm in Machine Learning - EnjoyAlgorithms

If we consider every feature as a dimension to visualize, we can not imagine more than three dimensions. That's where we require dimensionality...

T-distributed Stochastic Neighbor Embedding (t-SNE)

Feature Request : T-distributed Stochastic Neighbor Embedding (t-SNE) ... t-SNE is a manifold learning technique, which learns low ...