Countplot has inappropriate response to xaxis being a date column
See original GitHub issueThere is a bug in countplot when the x axis is a date column.
The raw seaborn plot shows apparently correct dates, but any interaction with the matplotlib axes object shows that the date information has been damaged and appears to be near the start of unix epoch time.
I believe this is a bug where, somehow, the year information in the original date column is destroyed while making the countplot.
Versions
- seaborn: ‘0.11.1’
- matplotlib: ‘3.4.2’
Reproducible example
Code below generates lineplots and countplots.
The lineplots have correct dates on the x axis. The countplot displays dates in Jan 1970 (start of unix epoch time) when mdates
formatting is applied.
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
import datetime as dt
# -- Load and clean subset of taxis data
taxis = sns.load_dataset('taxis')
taxis['pickup'] = pd.to_datetime(taxis['pickup'])
taxis['pickup_date'] = taxis['pickup'].dt.date
date_mask = (taxis['pickup_date']>=dt.date(2019,3,1)) & (taxis['pickup_date']<=dt.date(2019,3,3))
taxis = taxis[date_mask].sort_values(by=['pickup_date'])
def format_time_axis_example(ax):
"""Toy example using mdates to tidy dates on x axis"""
# setup formats
days = mdates.DayLocator(bymonthday=range(0, 31, 1)) # every nth day
minor_fmt = mdates.DateFormatter("%d") # eg. 05
months = mdates.MonthLocator() # every month
major_fmt = mdates.DateFormatter("%b%n%Y") # eg. Jan 2021
# format the ticks
ax.xaxis.set_major_locator(months)
ax.xaxis.set_major_formatter(major_fmt)
ax.xaxis.set_minor_locator(days)
ax.xaxis.set_minor_formatter(minor_fmt)
# ensure ticks exist and control appearance
ax.tick_params(axis="x", which="both", bottom=True) # major+minor displayed
# -- Lineplot with correct response to time axis formatting
print(f"Actual Date Range: {taxis['pickup_date'].min()} -- {taxis['pickup_date'].max()}")
ax = sns.lineplot(x='pickup_date', y='total',data=taxis)
plt.xticks(rotation=45)
plt.show()
ax = sns.lineplot(x='pickup_date', y='total',data=taxis)
format_time_axis_example(ax)
plt.show()
# -- Countplot with incorrect response to time axis formatting
print(f"Actual Date Range: {taxis['pickup_date'].min()} -- {taxis['pickup_date'].max()}")
ax = sns.countplot(x='pickup_date', data=taxis)
plt.show()
ax = sns.countplot(x='pickup_date', data=taxis)
format_time_axis_example(ax)
plt.show()
Lineplot working as expected:
Countplot with damaged date information on the x-axis, rendering in mdates as 1970:
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (4 by maintainers)
Top Results From Across the Web
How to prevent overlapping x-axis labels in sns.countplot
Having a Series ds like this import pandas as pd import seaborn as sns import matplotlib.pyplot as plt import numpy as np; np.random.seed(136)...
Read more >seaborn.countplot — seaborn 0.12.1 documentation - PyData |
This function always treats one of the variables as categorical and draws data at ordinal positions (0, 1, … n) on the relevant...
Read more >How to set limits of Y-axes in countplot?
1 Answer 1 ... Countplot from seaborn will not work as you expect. When you calculate the frequencies, you want to plot the...
Read more >Why is my legend in my seaborn countplot not showing all the ...
One issue seems to be that the plt.legend() command isn't operating on your current axis. You can do this instead: import seaborn as...
Read more >Solved: Sort chart x axis - Microsoft Power BI Community
One way to solve the problem is to add a another column to the data table with ranking numbers (the way you want...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
To be clear, it’s a float not an integer; sub-day resolution is possible through fractional values. Actually, using days rather than seconds gives you microsecond precision over a much wider range of dates.
nope, it’s a pretty straightforward (though they did recently realign the epoch, which used to be 1900-01-1; basically it was never originally intended to be “unix time”).
Yep.
I’ve been around a lot of datetime datasets from a lots of businesses / scientific instruments and unix epoch as an integer count in days has never been the raw data type.
I wonder if mdates has some smart fallback where when it sees a small integer range it presumes
days
when it would otherwise have defaulted toseconds
ormicroseconds
.