question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Box-Plot with outlier jitter

See original GitHub issue

image

What you see in that picture is a workaround for what I really would like to have. When searching the web you often got the combine-boxplot-with-swarmplot-solution. It would IMHO improve seaborn if this could be done via seaborn without a workaround.

The problems with that example are

  1. The outliers are drawn twice (green and red circles). Only draw the jittered outliers (the green ones).
  2. The none-outliers are also drawn. There is no need for them.

This is an MWE to reproduce that picture.

#!/usr/bin/env python3
import random
import pandas
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme()

random.seed(0)

df = pandas.DataFrame({
    'Vals': random.choices(range(200), k=200)})
df_outliers = pandas.DataFrame({
    'Vals': random.choices(range(400, 700), k=20)})

df = pandas.concat([df, df_outliers], axis=0)

flierprops = {
    'marker': 'o',
    'markeredgecolor': 'red',
    'markerfacecolor': 'none'
}

# Usual boxplot
ax = sns.boxplot(y='Vals', data=df, flierprops=flierprops)

# Add jitter with the swarmplot function
ax = sns.swarmplot(y='Vals', data=df, linewidth=.75, color='none', edgecolor='green')
plt.show()

Issue Analytics

  • State:closed
  • Created 10 months ago
  • Comments:16 (7 by maintainers)

github_iconTop GitHub Comments

2reactions
jhnclscommented, Nov 23, 2022

Here is a hacky way to work with a swarmplot instead of a stripplot for the outliers:

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

sns.set_theme()

df = pd.DataFrame({'Vals': np.concatenate([np.random.randint(0, 200, size=1000),
                                           np.random.randint(400, 700, size=100),
                                           np.arange(600, 620)])})
df['x'] = np.random.randint(0, 3, len(df))

ax = sns.boxplot(x='x', y='Vals', data=df, orient='v')

xpos = np.array([])
ypos = np.array([])
for line in ax.lines:
     if line.get_linestyle() == 'None':
          xpos = np.append(xpos, line.get_xdata())
          ypos = np.append(ypos, line.get_ydata())
          line.remove()
sns.swarmplot(x=xpos, y=ypos, ax=ax, color='red', orient='v')

plt.tight_layout()
plt.show()

image

0reactions
mwaskomcommented, Nov 26, 2022

Yes, you’d also need a stat transform that filters to/out outliers. (And a swarm mark since that’s apparently what’s actually desired here, not jitter).

Boxplots are annoying in that they’re a “standard” plot type but they’re actually quite complicated to make and open the door to all sorts of API complexity.

Read more comments on GitHub >

github_iconTop Results From Across the Web

apply jittering to outliers data in a boxplot with ggplot2
do you have any idea of how to apply jittering just to the outliers data of a boxplot? This is the code: ggplot(data...
Read more >
Add option to jitter outliers in a boxplot · Issue #4480 - GitHub
I would like to be able to add a small amount of jittering to outliers in a boxplot or alternatively stack the points...
Read more >
Box plot with jittered data points in ggplot2 - R CHARTS
Box plot in ggplot2 with jitter. A good practice is removing the outliers of the box plot with outlier.shape = NA , as...
Read more >
This geom is similar to 'geom_boxplot', but allows to jitter... in ...
This geom is similar to geom_boxplot, but allows to jitter outlier points and to raster points layer.
Read more >
A box and whiskers plot (in the style of Tukey) - ggplot2
The boxplot compactly displays the distribution of a continuous variable. ... (e.g. "jitter" to use position_jitter ), or the result of a call...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found