Time-series Line / Bar chart: Add option to plot 0 if no data.
See original GitHub issueIssue
I have a simple metric (say: SUM(orders_count)
), and I would like to track its evolution over time. My objective is to visualize the total for different time granularity (group by hour, day, week, etc.).
However, I don’t always have records for each time bucket (e.g. I didn’t had any order between 2 and 3am, or on Sunday, etc.). In this situation, I want to plot 0
for each time bucket I had no data for.
In that situation, the line chart simply omit the given time bucket and interpolate the line given the previous and next data point:
I am aware that I can use pandas
resampling methods to actually plot a 0
data point on each given hour I had no order:
However, this workaround doesn’t “follow” the time granularity I aggregate data on:
This means that each time I want to update the granularity, I have to modify it at two different places to get the correct graph.
Moreover, this makes that I cannot pass the time granularity as a parameter from a native filter.
Requested feature
I would like a simple option to replace missing values (i.e. time buckets with no record to aggregate) with either:
- linear interpolation (current behaviour),
- nothing (can be emulated via
pandas.resample(<granularity>, mean)
), - 0 (can be emulated via
pandas.resample(<granularity>, sum)
).
Metabase uses a simple dropdown menu for that:
Alternatives
As seen above, using pandas.resample
is solving the issue partially only, as it doesn’t dynamically adjust to the selected time granularity.
One could of course write a custom query that joins the list of every single hour between min(timestamp)
and max(timestamp)
to force generate records for these time buckets… but it’s a lot of work — and again wouldn’t be dynamically adapting to different time grains.
Context
Examples generated on Superset 1.1.0.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:2
- Comments:10 (8 by maintainers)
Top GitHub Comments
Thanks for the proposal, I will finish it in the resample(advanced analytics).
To provide some context: We believe “0-filling” (in addition to interpolation and gaps) can be relevant when, e.g., you want to plot the count of records or sum of a given metric per <time_grain>. In such cases, not having data (for a given time bucket) means something — i.e., that the count/sum is actually 0.