question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Feature request: Parallel/Multicore implementation of t-SNE

See original GitHub issue

Has there been any discussion on implementing a multicore version of t-SNE in sklearn?

The fastest version that I have seen/used is https://github.com/claczny/VizBin/tree/master/src/backend/bh_tsne .

I think a few simple addition would be to change:

line 160 of https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/manifold/t_sne.py from:

pdist(X_embedded, "sqeuclidean") to the following: sklearn.metrics.pairwise to use the n_jobs parameter

In regards to the actual algorithm, https://github.com/DmitryUlyanov/Multicore-TSNE has implemented this in a different programming language so I cannot see what has been done. I believe his method only works for 2D embeddings (which could be fine if noted) and is very fast.

I also had very basic (and potentially naive) ideas in using bayesian optimization to speed up the algorithm as well if anyone has any insight: https://www.reddit.com/r/MachineLearning/comments/78i9rh/discussion_bayesian_optimization_of_tsne/ but this may not be the right place for that.

Just trying to think of ways to use this on larger datasets.

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:14 (11 by maintainers)

github_iconTop GitHub Comments

10reactions
lmcinnescommented, Feb 28, 2018

I believe parallelising the neighbour computation (be it via balltrees, or full distance computation) would be relatively straightforward. There are, of course, seriously diminishing returns on that for much more than 4 or 8 cores. Paralellising the gradient descent is rather harder, but I think there are even less gains to be had there. I would be willing to take a look at this at some point if people are still interested.

1reaction
cmarmocommented, Jan 29, 2021

neighbors computation has been multithreaded in #15082 and the gradient computation has been parallelized in #13264. Is this issue still relevant? Thanks.

Read more comments on GitHub >

github_iconTop Results From Across the Web

openTSNE: Extensible, parallel implementations of t-SNE
The library is designed to be extensible and it is easy to implement and use your own components and makes experimentation very simple....
Read more >
sklearn.manifold.TSNE — scikit-learn 1.2.0 documentation
t-SNE [1] is a tool to visualize high-dimensional data. It converts similarities between data points to joint probabilities and tries to minimize the ......
Read more >
T-distributed Stochastic Neighbor Embedding(t-SNE)
What if you have hundreds of features or data points in a dataset, and you want to represent them in a 2-dimensional or...
Read more >
t-SNE Algorithm in Machine Learning - EnjoyAlgorithms
If we consider every feature as a dimension to visualize, we can not imagine more than three dimensions. That's where we require dimensionality...
Read more >
T-distributed Stochastic Neighbor Embedding (t-SNE)
Feature Request : T-distributed Stochastic Neighbor Embedding (t-SNE) ... t-SNE is a manifold learning technique, which learns low ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found