question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Countplot has inappropriate response to xaxis being a date column

See original GitHub issue

There is a bug in countplot when the x axis is a date column.

The raw seaborn plot shows apparently correct dates, but any interaction with the matplotlib axes object shows that the date information has been damaged and appears to be near the start of unix epoch time.

I believe this is a bug where, somehow, the year information in the original date column is destroyed while making the countplot.

Versions

  • seaborn: ‘0.11.1’
  • matplotlib: ‘3.4.2’

Reproducible example

Code below generates lineplots and countplots.

The lineplots have correct dates on the x axis. The countplot displays dates in Jan 1970 (start of unix epoch time) when mdates formatting is applied.

import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
import datetime as dt

# -- Load and clean subset of taxis data
taxis = sns.load_dataset('taxis')
taxis['pickup'] = pd.to_datetime(taxis['pickup'])
taxis['pickup_date'] = taxis['pickup'].dt.date
date_mask = (taxis['pickup_date']>=dt.date(2019,3,1)) & (taxis['pickup_date']<=dt.date(2019,3,3))
taxis = taxis[date_mask].sort_values(by=['pickup_date'])

def format_time_axis_example(ax):
    """Toy example using mdates to tidy dates on x axis"""
    # setup formats
    days = mdates.DayLocator(bymonthday=range(0, 31, 1))  # every nth day
    minor_fmt = mdates.DateFormatter("%d")  # eg. 05
    months = mdates.MonthLocator()  # every month
    major_fmt = mdates.DateFormatter("%b%n%Y")  # eg. Jan 2021

    # format the ticks
    ax.xaxis.set_major_locator(months)
    ax.xaxis.set_major_formatter(major_fmt)
    ax.xaxis.set_minor_locator(days)
    ax.xaxis.set_minor_formatter(minor_fmt)

    # ensure ticks exist and control appearance
    ax.tick_params(axis="x", which="both", bottom=True)  # major+minor displayed


# -- Lineplot with correct response to time axis formatting
print(f"Actual Date Range: {taxis['pickup_date'].min()} -- {taxis['pickup_date'].max()}")
ax = sns.lineplot(x='pickup_date', y='total',data=taxis)
plt.xticks(rotation=45)
plt.show()
ax = sns.lineplot(x='pickup_date', y='total',data=taxis)
format_time_axis_example(ax)
plt.show()

# -- Countplot with incorrect response to time axis formatting
print(f"Actual Date Range: {taxis['pickup_date'].min()} -- {taxis['pickup_date'].max()}")
ax = sns.countplot(x='pickup_date', data=taxis)
plt.show()
ax = sns.countplot(x='pickup_date', data=taxis)
format_time_axis_example(ax)
plt.show()

Lineplot working as expected: image

Countplot with damaged date information on the x-axis, rendering in mdates as 1970: image

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
mwaskomcommented, Nov 13, 2021

integer count in days has never been the raw data type

To be clear, it’s a float not an integer; sub-day resolution is possible through fractional values. Actually, using days rather than seconds gives you microsecond precision over a much wider range of dates.

wonder if mdates has some smart fallback

nope, it’s a pretty straightforward (though they did recently realign the epoch, which used to be 1900-01-1; basically it was never originally intended to be “unix time”).

0reactions
JMBurleycommented, Nov 13, 2021

What would you have expected, seconds since unix epoch?

Yep.

I’ve been around a lot of datetime datasets from a lots of businesses / scientific instruments and unix epoch as an integer count in days has never been the raw data type.

I wonder if mdates has some smart fallback where when it sees a small integer range it presumes days when it would otherwise have defaulted to seconds or microseconds.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to prevent overlapping x-axis labels in sns.countplot
Having a Series ds like this import pandas as pd import seaborn as sns import matplotlib.pyplot as plt import numpy as np; np.random.seed(136)...
Read more >
seaborn.countplot — seaborn 0.12.1 documentation - PyData |
This function always treats one of the variables as categorical and draws data at ordinal positions (0, 1, … n) on the relevant...
Read more >
How to set limits of Y-axes in countplot?
1 Answer 1 ... Countplot from seaborn will not work as you expect. When you calculate the frequencies, you want to plot the...
Read more >
Why is my legend in my seaborn countplot not showing all the ...
One issue seems to be that the plt.legend() command isn't operating on your current axis. You can do this instead: import seaborn as...
Read more >
Solved: Sort chart x axis - Microsoft Power BI Community
One way to solve the problem is to add a another column to the data table with ranking numbers (the way you want...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found