question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ENH: speeding up pairplot and alike

See original GitHub issue

I think there might be room for speeding up methods like sns.pairplot.

I want to be able to draw huge plots (100x100 or more) and this is of course unrealistic with the current code. I wonder if we could parallelise some of the loops like this one:

https://github.com/mwaskom/seaborn/blob/445a54aa46e279406949e0a3c2eed88d6cf80223/seaborn/axisgrid.py#L649

On matplotlib’s side I am confident there would be no blocker as they have this tutorial https://matplotlib.org/stable/gallery/misc/multiprocess_sgskip.html

I am not entirely sure if this would be the correct place to do it. Let me know if this is of interest/feasible and I would be happy to help.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
mwaskomcommented, Jun 29, 2021

Here’s an extraordinarily simple proof of principle that I threw together just for exploration:

import numpy as np
import matplotlib.pyplot as plt
from joblib.parallel import Parallel, delayed

def pairplot(ax, x, y):
    ax.scatter(x, y)
    
xs = np.random.randn(4, 10000)
ys = np.random.randn(4, 10000)

f, axs = plt.subplots(1, 4, sharex=True, sharey=True)
Parallel(n_jobs=4)(delayed(pairplot)(ax, x, y) for ax, x, y in zip(axs, xs, ys))

With n_jobs=1 this works as expected but with n_jobs=4 I get four blank axes. I’ve done no further debugging or tuning.

I think I’ll close for now, not because I’m opposed to the idea of parallelizing the axisgrid objects in principle, but because I’m skeptical that it’s possible for reasons outside of seaborn’s control. (Unlike some projects that leave open any issue touching on a feature that doesn’t technically exist, I prefer to keep the issue queue relatively high signal and close issues below a certain priority threshold even if they are good ideas; I’ve revisited lots of closed issues in the past).

I’d rather consider the usefulness/feasibility of this idea in the context of an existing prototype of how one might parallelize operations over the axes in a single matplotlib figure in general.

0reactions
tupuicommented, Jun 29, 2021

Thanks for the link to the heat-scatter. I will play with that and some measures.

In the meantime, feel free to close the issue if you think this would be more something to address on matplotlib’s side. Or that this is not realistic/out of scope here.

Read more comments on GitHub >

github_iconTop Results From Across the Web

What are ways to speed up seaborns pairplot - Stack Overflow
The off-diagonal should be scatter plots comparing two variables. Apparently, this can be done quickly unlike histograms. You can make these 2d ...
Read more >
pairplot y_vars bug · Issue #2260 · mwaskom/seaborn - GitHub
The reason it turned up in 0.11 is that pairplot changed to actually use this logic, i.e. to show a marginal plot on...
Read more >
Seaborn Pairplot: Enhance Your Data Understanding with a ...
The Seaborn Pairplot allows us to plot pairwise relationships between variables within a dataset. This creates a nice visualisation and helps ...
Read more >
Pair plot that shows the correlation among three of the ...
We can see that the wind speed feature has two clusters according to failure or success. However, there are also some failure cases...
Read more >
PairPlot and PairGrid in Python - Regenerative
This article is a tutorial on how to make Pairplots of different styles. This article will cover: Pair plot using Pandas and Matplotlib ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found