question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Any ways to disable interpolation for irregular dates?

See original GitHub issue

Problem

When plotting times series data with irregular dates, dates would be interpolated in to complete calendar dates and the corresponding values are also interpolated. This interpolation may introduce unexpected “disturbance” into the resulting plot.

import pandas as pd
import altair as alt

calendar_dates = pd.date_range('20190101', '20190531')
holidays = pd.date_range('20190301', '20190331')
x = [d for d in calendar_dates if d not in holidays]
df = pd.DataFrame({'x': x, 'y': list(range(len(x)))})

alt.Chart(df).mark_line().encode(x='x', y='y').interactive()

image

In the above example, data can only be observed on non-holidays and all the points should lie on a continuous straight line. There occurs a “twist” in the middle only because of interpolation for holidays, which is not interested in the analysis. I believe this is a common problem in visualizing time series.

Similar Issues I have researched

Issue #1768 mentioned alt.ImputeParams. But the strange twist still exists since the dates are still interpolated

Issue #1005 mentioned custom domain on datetime axis. This is a promising solution. But for this to work, altair needs to accept a list of fixed dates as domain. It seems that this has not been supported yet, according to the documentation of the domain parameter

For temporal fields, domain can be a two-element array minimum and maximum values, in the form of either timestamps or the DateTime definition objects.

Actually, I encounter the same problem in matplotlib and bqplot. I managed to come up with a solution, but they are quite awkward and hacky. In matplotlib, I use integer as x-axis, and map the integer tick label to a date. This fakes a date axis. In bqplot, I use double axis, one invisible axis of integer scale, the other visible axis of date scale.

So I want to discuss, is there a cleaner way to disable this interpolation behavior for irregular time series? And I wonder, given that this interpolation behavior is default for both bqplot and altair, does it come from the underlying javascript library? Can we handle this on the python side? I am happy to do a PR if needed.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:11 (6 by maintainers)

github_iconTop GitHub Comments

2reactions
jakevdpcommented, Jun 3, 2021

The chart is not filling-in missing values; the chart is connecting adjacent values with a line, because you used mark_line(). The way to tell the renderer not to connect adjacent values with mark_line() is to insert NaNs in the spaces where you don’t want connections.

Alternatively, you can use something like mark_point() which does not attempt to draw lines between adjacent values.

0reactions
jakevdpcommented, Jun 3, 2021

There is no way to have a continuous temporal encoding that leaves out some blocks of time (by definition that is no longer continuous). Your only option for that is to use an ordinal encoding, which is incompatible with scale-bound selections (i.e. pan and zoom).

Read more comments on GitHub >

github_iconTop Results From Across the Web

Python regularise irregular time series with linear interpolation
You can do this with traces. First, create a TimeSeries with your irregular measurements like you would a dictionary: ts = traces.
Read more >
Irregular Time Series to Regular using Interpolation - MathWorks
Convert the date/time info to a serial numeric value using datenum. ... Then use interp1 to interpolate, at values that are 15 minutes...
Read more >
Interpolate Irregular Data function—Help | Documentation
The Interpolate Irregular Data function converts irregularly gridded point clouds or raster data into square pixels.
Read more >
Using the R forecast package with missing values and/or ...
First, you calculate how many units of time the time series actually would have between the dates of your chosing and create that...
Read more >
How To Resample and Interpolate Your Time Series Data ...
Looking at a line plot, we see no difference from plotting the original data as the plot already interpolated the values between points...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found