norm_hist and kde
See original GitHub issueHi all,
(first of all: awesome library, I love it)
I am wondering about the default behavior of distplot
when norm_hist
is False
.
At least on 0.8.0, when
sns.distplot(x, norm_hist=False)
produces a figure that is 1) normalized and 2) still has the KDE, which is a bit of a gotcha (i.e. unless you carefully read the docs for norm_hist
and kde
and infer if kde
is default-True, and it might override norm_hist=False
.
If you run:
sns.distplot(x, norm_hist=False, kde=False)
This will give you an unnormed, sans-KDE distribution.
Which itself is a little disappointing since the KDE is actually super nice for understanding the structure of the data.
I can think of two potential ways to address this mild annoyance:
- default
kde=None
and have it infer if it should compute a KDE from the value ofnorm_hist
, or - if
norm_hist=False
, compute the KDE of the normalized figure, but then multiply it by the integration value of the distribution to put it on the plot. (I am not a statistician, so this seems fine to me, but perhaps isn’t kosher for some reason?)
I’d be open to doing this myself (esp 2), as long as I know you’ll accept the PR 😅 .
Cheers!
Issue Analytics
- State:
- Created 6 years ago
- Reactions:1
- Comments:6 (4 by maintainers)
Top Results From Across the Web
Histograms vs. KDEs Explained - Towards Data Science
Building upon the histogram example, I will explain how to construct a KDE and why you should add KDEs to your data science...
Read more >In-Depth: Kernel Density Estimation | Python Data Science ...
Kernel density estimation (KDE) is in some senses an algorithm which ... the standard count-based histogram can be created with the plt.hist() function....
Read more >Python: "Normalizing" kde, so it always lines up with histogram
The lines statement overlays the default kernel density estimator (KDE) of the density procedure onto the histogram. One can change the ...
Read more >seaborn.kdeplot — seaborn 0.12.1 documentation - PyData |
A kernel density estimate (KDE) plot is a method for visualizing the distribution of observations in a dataset, analogous to a histogram. KDE...
Read more >Adding KDE and Normal distribution to a Histogram
To demonstrate what I meant in the comment: fig, ax = plt.subplots() data.plot.hist(ax=ax, alpha=0.5) ax2 = ax.twinx() data.plot.kde(ax=ax2).
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Hi. Actually I’m quite fond of norm_hist and would appreciate an evolution of it rather than seeing it disappear. As seen in #479 #1396 and #61, in certain situations it’s problematic to not be able to scale or “denormalize” a kde in distplot.
Here is my situation, I plot 2 histograms on the same axes to see the differences. At first I used matplotlib’s hist as a ‘stepfilled’ with low alpha.
I don’t really care about the values on y axis, I want to keep the proportion between both sets as said in #61 I wanted to have a better visualization with kde using distplot, I know kde is about density and having an area of 1 under the curve but as I said I don’t care about the values, I just need to keep the correct proportion between both sets. Here is the code,
range
was used to keep the same bins width with both sets with kde.I would have like to be able to correct this by giving both sets to one distplot rather than doing 2 distplots or by adding something like
norm_kde=False
to keep the height of the kde as it is for the histogram. I did it by drawing on different axes and changing the ylim of each kde in function of the area occupied by each set since whatever the base area, a kde will have an area of 1.0So what I mean is:
norm=False
Closed with #2125