Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

boxenplot area scale calculation

See original GitHub issue

The area method for calculating the width of boxenplot letter-value boxes is:

'area': lambda h, i, k: (1 - 2**(-k + i - 2)) / h}

in https://github.com/mwaskom/seaborn/blob/master/seaborn/categorical.py#L1890

IIUC, in order for the area to be proportional to the percentage of data covered, as documented (https://github.com/mwaskom/seaborn/blob/master/seaborn/categorical.py#L2672), the formula should rather be:

'area': lambda h, i, k: (1 - 2**(-k + i - 1)) / h}

Issue Analytics

State:
Created 3 years ago
Comments:25 (14 by maintainers)

Top GitHub Comments

1reaction

mwaskomcommented, Aug 30, 2020

Is there a compelling reason to use area? Could we just deprecate it?

0reactions

pierreDELANGENcommented, Nov 9, 2022

Hello, I’m digging up this issue, but when comparing “area” scaling to the original paper’s representation (Figure 3C, https://doi.org/10.1080/10618600.2017.1305277), it seems that the Seaborn implementation is still incorrect. Seaborn (random uniform): Screenshot from 2022-11-09 17-13-50 Expected result from the paper :

For me the “area” scaling is the “correct” way of doing boxenplots as it is directly representative of the underlying PDF of the studied variable, but it still allows easy reading of quantiles and differences between multiple categories. It is a bit of an hybrid between an histogram and a boxplot.

Top Results From Across the Web

boxenplot area scale calculation - - Bountysource

The area method for calculating the width of boxenplot letter-value boxes is: 'area': lambda h, i, k: (1 - 2**(-k + i -...

seaborn.boxenplot — seaborn 0.12.1 documentation - PyData |

scale {“exponential”, “linear”, “area”}, optional. Method to use for the width of the letter value boxes. All give similar results visually. “linear” reduces...

seaborn.boxenplot — seaborn 0.9.0 documentation

scale : “linear” | “exponential” | “area”. Method to use for the width of the letter value boxes. All give similar results visually....

Python - seaborn.boxenplot() method - GeeksforGeeks

scale : Method to use for the width of the letter value boxes. outlier_prop : Proportion of data believed to be outliers. showfliers...

Letter-value plots: Boxplots for large data - Hadley Wickham

from 1341 (box #32) to 7865 (Box #13), with a median sample size of ... formula, SEfactor, for the first 20 letter values,...