Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Time-series Line / Bar chart: Add option to plot 0 if no data.

See original GitHub issue

Issue

I have a simple metric (say: SUM(orders_count)), and I would like to track its evolution over time. My objective is to visualize the total for different time granularity (group by hour, day, week, etc.).

However, I don’t always have records for each time bucket (e.g. I didn’t had any order between 2 and 3am, or on Sunday, etc.). In this situation, I want to plot 0 for each time bucket I had no data for.

In that situation, the line chart simply omit the given time bucket and interpolate the line given the previous and next data point:

Screenshot from 2021-06-08 09-26-46_shadow

I am aware that I can use pandas resampling methods to actually plot a 0 data point on each given hour I had no order:

Screenshot from 2021-06-08 09-27-10_shadow

However, this workaround doesn’t “follow” the time granularity I aggregate data on:

Screenshot from 2021-06-08 09-27-37_shadow

This means that each time I want to update the granularity, I have to modify it at two different places to get the correct graph.

Screenshot from 2021-06-08 09-27-55_shadow

Moreover, this makes that I cannot pass the time granularity as a parameter from a native filter.

Requested feature

I would like a simple option to replace missing values (i.e. time buckets with no record to aggregate) with either:

linear interpolation (current behaviour),
nothing (can be emulated via pandas.resample(<granularity>, mean)),
0 (can be emulated via pandas.resample(<granularity>, sum)).

Metabase uses a simple dropdown menu for that:

Screenshot from 2021-06-08 11-02-27_shadow.png

Alternatives

As seen above, using pandas.resample is solving the issue partially only, as it doesn’t dynamically adjust to the selected time granularity.

One could of course write a custom query that joins the list of every single hour between min(timestamp) and max(timestamp) to force generate records for these time buckets… but it’s a lot of work — and again wouldn’t be dynamically adapting to different time grains.

Context

Examples generated on Superset 1.1.0.

Issue Analytics

State:
Created 2 years ago
Reactions:2
Comments:10 (8 by maintainers)

Top GitHub Comments

2reactions

zhaoyongjiecommented, Jul 1, 2021

Thanks for the proposal, I will finish it in the resample(advanced analytics).

1reaction

EBoisseauSierracommented, Jul 1, 2021

To provide some context: We believe “0-filling” (in addition to interpolation and gaps) can be relevant when, e.g., you want to plot the count of records or sum of a given metric per <time_grain>. In such cases, not having data (for a given time bucket) means something — i.e., that the count/sum is actually 0.