Any ways to disable interpolation for irregular dates?
See original GitHub issueProblem
When plotting times series data with irregular dates, dates would be interpolated in to complete calendar dates and the corresponding values are also interpolated. This interpolation may introduce unexpected “disturbance” into the resulting plot.
import pandas as pd
import altair as alt
calendar_dates = pd.date_range('20190101', '20190531')
holidays = pd.date_range('20190301', '20190331')
x = [d for d in calendar_dates if d not in holidays]
df = pd.DataFrame({'x': x, 'y': list(range(len(x)))})
alt.Chart(df).mark_line().encode(x='x', y='y').interactive()
In the above example, data can only be observed on non-holidays and all the points should lie on a continuous straight line. There occurs a “twist” in the middle only because of interpolation for holidays, which is not interested in the analysis. I believe this is a common problem in visualizing time series.
Similar Issues I have researched
Issue #1768 mentioned alt.ImputeParams
. But the strange twist still exists since the dates are still interpolated
Issue #1005 mentioned custom domain on datetime axis. This is a promising solution. But for this to work, altair
needs to accept a list of fixed dates as domain. It seems that this has not been supported yet, according to the documentation of the domain
parameter
For temporal fields, domain can be a two-element array minimum and maximum values, in the form of either timestamps or the DateTime definition objects.
Actually, I encounter the same problem in matplotlib
and bqplot
. I managed to come up with a solution, but they are quite awkward and hacky.
In matplotlib
, I use integer as x-axis, and map the integer tick label to a date. This fakes a date axis.
In bqplot
, I use double axis, one invisible axis of integer scale, the other visible axis of date scale.
So I want to discuss, is there a cleaner way to disable this interpolation behavior for irregular time series? And I wonder, given that this interpolation behavior is default for both bqplot
and altair
, does it come from the underlying javascript library? Can we handle this on the python side? I am happy to do a PR if needed.
Issue Analytics
- State:
- Created 4 years ago
- Comments:11 (6 by maintainers)
Top GitHub Comments
The chart is not filling-in missing values; the chart is connecting adjacent values with a line, because you used
mark_line()
. The way to tell the renderer not to connect adjacent values withmark_line()
is to insert NaNs in the spaces where you don’t want connections.Alternatively, you can use something like
mark_point()
which does not attempt to draw lines between adjacent values.There is no way to have a continuous temporal encoding that leaves out some blocks of time (by definition that is no longer continuous). Your only option for that is to use an ordinal encoding, which is incompatible with scale-bound selections (i.e. pan and zoom).