boxenplot area scale calculation
See original GitHub issueThe area
method for calculating the width of boxenplot letter-value boxes is:
'area': lambda h, i, k: (1 - 2**(-k + i - 2)) / h}
in https://github.com/mwaskom/seaborn/blob/master/seaborn/categorical.py#L1890
IIUC, in order for the area to be proportional to the percentage of data covered, as documented (https://github.com/mwaskom/seaborn/blob/master/seaborn/categorical.py#L2672), the formula should rather be:
'area': lambda h, i, k: (1 - 2**(-k + i - 1)) / h}
Issue Analytics
- State:
- Created 3 years ago
- Comments:25 (14 by maintainers)
Top Results From Across the Web
boxenplot area scale calculation - - Bountysource
The area method for calculating the width of boxenplot letter-value boxes is: 'area': lambda h, i, k: (1 - 2**(-k + i -...
Read more >seaborn.boxenplot — seaborn 0.12.1 documentation - PyData |
scale {“exponential”, “linear”, “area”}, optional. Method to use for the width of the letter value boxes. All give similar results visually. “linear” reduces...
Read more >seaborn.boxenplot — seaborn 0.9.0 documentation
scale : “linear” | “exponential” | “area”. Method to use for the width of the letter value boxes. All give similar results visually....
Read more >Python - seaborn.boxenplot() method - GeeksforGeeks
scale : Method to use for the width of the letter value boxes. outlier_prop : Proportion of data believed to be outliers. showfliers...
Read more >Letter-value plots: Boxplots for large data - Hadley Wickham
from 1341 (box #32) to 7865 (Box #13), with a median sample size of ... formula, SEfactor, for the first 20 letter values,...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Is there a compelling reason to use area? Could we just deprecate it?
Hello, I’m digging up this issue, but when comparing “area” scaling to the original paper’s representation (Figure 3C, https://doi.org/10.1080/10618600.2017.1305277), it seems that the Seaborn implementation is still incorrect. Seaborn (random uniform): Expected result from the paper :
For me the “area” scaling is the “correct” way of doing boxenplots as it is directly representative of the underlying PDF of the studied variable, but it still allows easy reading of quantiles and differences between multiple categories. It is a bit of an hybrid between an histogram and a boxplot.