question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[discussion] Adding ECDF to seaborn?

See original GitHub issue

@mwaskom referencing this tweet re: ECDFs; I have a simple implementation ready to go which I have stored in textexpander, but I think it might be a useful contribution to seaborn users.

The simplest unit of visualization is a scatterplot, for which an API might be:

def ecdf(df, column, ax=None, step=True):
    #### "if ax" logic goes here" ####
    np.sort(df[column]), np.arange(1, len(df)+1) / len(df)
    if step:
        ax.step(x, y)
    else:
        ax.scatter(x, y)
    return ax

With this plotting unit, it can be easily inserted into the pairplot as a replacement for the histogram that occurs on the diagonal (as an option for end-users, of course, not mandatory). I can also see extensions to other kinds of plots, for example, plotting multiple ECDFs on the same axes object.

As I understand it, distplot exists, and yes, granted, visualizing histograms is quite idiomatic for many users. That said, I do see some advantages of using ECDFs over histograms, the biggest one being that all data points are plotted, meaning it is impossible to bias the data using bins. I have more details in a blog post, but at a high level, the other biggest advantage I can see is reading off quantiles from the data easily. Also, compared to estimating a KDE, we make no assumptions regarding how the data are distributed (though yes, we can debate whether this is a good or bad thing).

If you’re open to having ECDFs inside seaborn, I’m happy to work on a PR for this. Might need some guidance to see if there’s special things I need to look out for in the codebase (admittedly, it’ll be my first PR to seaborn). Please let me know; I’m also happy to discuss more here before taking any action.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:3
  • Comments:11 (4 by maintainers)

github_iconTop GitHub Comments

6reactions
mwaskomcommented, Aug 11, 2020

Closed by #2141

3reactions
mwaskomcommented, Jun 15, 2020

This is almost done. Feel free to weigh in on #2141 if you have thoughts about features or implementation.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to Make ECDF Plot with Seaborn in Python?
The first way is to use ecdfplot() function to directly plot the ECDF plot and in the function pass you data and column...
Read more >
seaborn.ecdfplot — seaborn 0.12.1 documentation - PyData |
Plot empirical cumulative distribution functions. An ECDF represents the proportion or count of observations falling below each unique value in a dataset.
Read more >
How to Generate ECDF Plot using Python and R
We can generate the values by calling the dcst class method ecdf( ) and save the generated values in x and y. Next,...
Read more >
Lesson 33: Review of exercise 4
Your task is to load in the data, and then add these columns to the DataFrame . ... This allows for Seaborn-style ECDF...
Read more >
Plot ECDF in Python - SQLRelease
We can use our own logic to create an ECDF plot or else we can simply use the seaborn library which provides a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found