[discussion] Adding ECDF to seaborn?
See original GitHub issue@mwaskom referencing this tweet re: ECDFs; I have a simple implementation ready to go which I have stored in textexpander, but I think it might be a useful contribution to seaborn
users.
The simplest unit of visualization is a scatterplot, for which an API might be:
def ecdf(df, column, ax=None, step=True):
#### "if ax" logic goes here" ####
np.sort(df[column]), np.arange(1, len(df)+1) / len(df)
if step:
ax.step(x, y)
else:
ax.scatter(x, y)
return ax
With this plotting unit, it can be easily inserted into the pairplot as a replacement for the histogram that occurs on the diagonal (as an option for end-users, of course, not mandatory). I can also see extensions to other kinds of plots, for example, plotting multiple ECDFs on the same axes object.
As I understand it, distplot
exists, and yes, granted, visualizing histograms is quite idiomatic for many users. That said, I do see some advantages of using ECDFs over histograms, the biggest one being that all data points are plotted, meaning it is impossible to bias the data using bins. I have more details in a blog post, but at a high level, the other biggest advantage I can see is reading off quantiles from the data easily. Also, compared to estimating a KDE, we make no assumptions regarding how the data are distributed (though yes, we can debate whether this is a good or bad thing).
If you’re open to having ECDFs inside seaborn, I’m happy to work on a PR for this. Might need some guidance to see if there’s special things I need to look out for in the codebase (admittedly, it’ll be my first PR to seaborn). Please let me know; I’m also happy to discuss more here before taking any action.
Issue Analytics
- State:
- Created 5 years ago
- Reactions:3
- Comments:11 (4 by maintainers)
Closed by #2141
This is almost done. Feel free to weigh in on #2141 if you have thoughts about features or implementation.