ENH: speeding up pairplot and alike
See original GitHub issueI think there might be room for speeding up methods like sns.pairplot
.
I want to be able to draw huge plots (100x100 or more) and this is of course unrealistic with the current code. I wonder if we could parallelise some of the loops like this one:
On matplotlib’s side I am confident there would be no blocker as they have this tutorial https://matplotlib.org/stable/gallery/misc/multiprocess_sgskip.html
I am not entirely sure if this would be the correct place to do it. Let me know if this is of interest/feasible and I would be happy to help.
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
What are ways to speed up seaborns pairplot - Stack Overflow
The off-diagonal should be scatter plots comparing two variables. Apparently, this can be done quickly unlike histograms. You can make these 2d ...
Read more >pairplot y_vars bug · Issue #2260 · mwaskom/seaborn - GitHub
The reason it turned up in 0.11 is that pairplot changed to actually use this logic, i.e. to show a marginal plot on...
Read more >Seaborn Pairplot: Enhance Your Data Understanding with a ...
The Seaborn Pairplot allows us to plot pairwise relationships between variables within a dataset. This creates a nice visualisation and helps ...
Read more >Pair plot that shows the correlation among three of the ...
We can see that the wind speed feature has two clusters according to failure or success. However, there are also some failure cases...
Read more >PairPlot and PairGrid in Python - Regenerative
This article is a tutorial on how to make Pairplots of different styles. This article will cover: Pair plot using Pandas and Matplotlib ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Here’s an extraordinarily simple proof of principle that I threw together just for exploration:
With
n_jobs=1
this works as expected but withn_jobs=4
I get four blank axes. I’ve done no further debugging or tuning.I think I’ll close for now, not because I’m opposed to the idea of parallelizing the axisgrid objects in principle, but because I’m skeptical that it’s possible for reasons outside of seaborn’s control. (Unlike some projects that leave open any issue touching on a feature that doesn’t technically exist, I prefer to keep the issue queue relatively high signal and close issues below a certain priority threshold even if they are good ideas; I’ve revisited lots of closed issues in the past).
I’d rather consider the usefulness/feasibility of this idea in the context of an existing prototype of how one might parallelize operations over the axes in a single matplotlib figure in general.
Thanks for the link to the heat-scatter. I will play with that and some measures.
In the meantime, feel free to close the issue if you think this would be more something to address on matplotlib’s side. Or that this is not realistic/out of scope here.