Proposal: Dataset in Tabular Format
See original GitHub issueHi, this is a proposal for @theiliad, about a change in the shape of the data
object given as chart input.
This is based on a pattern we often use at Accurat, which is to always start from non-nested datasets.
// Dataset with columns `x`, `y`, `category`
const tabularDataset = [
{ x: new Date(2020, 1, 1), y: 32100, category: 'A' },
{ x: new Date(2020, 1, 2), y: 23500, category: 'A' },
{ x: new Date(2020, 1, 3), y: 53100, category: 'A' },
{ x: new Date(2020, 1, 4), y: 42300, category: 'A' },
{ x: new Date(2020, 1, 5), y: 12300, category: 'A' },
]
We find that tabular datasets are, for the fact that they avoid nesting:
- easy to generate for the developer using the chart, very often being that a CSV import
- easy to manipulate from the chart internals
- easy to debug, being each datapoint/datasymbol associated to a single dataset row/object
- pretty simple to transform into nested format (
lodash.groupBy
ord3.nest
) - usable for different charts, without changing the shape. They only depend on the semantic meaning of the chart.
Follows two cases of conversions from old to new proposed format.
AS IS:
const lineData_AS_IS = {
labels: ["Qty", "More", "Sold", "Restocking", "Misc"],
datasets: [
{
label: "Dataset 1",
data: [32100, 23500, 53100, 42300, 12300]
},
{
label: "Dataset 2",
data: [34200, 53200, 42300, 21400, 0]
},
{
label: "Dataset 3 long name",
data: [41200, 23400, 34210, 1400, 42100]
},
{
label: "Dataset 4 long name",
data: [22000, 1200, 9000, 24000, 3000]
},
{
label: "Dataset 5 long name",
data: [2412, 30000, 10000, 5000, 31000]
},
{
label: "Dataset 6 long name",
data: [0, 20000, 40000, 60000, 80000]
}
]
};
const lineTimeSeriesData_AS_IS = {
datasets: [
{
label: "Dataset 1",
data: [
{ date: new Date(2019, 0, 1), value: 10000 },
{ date: new Date(2019, 0, 5), value: 65000 },
{ date: new Date(2019, 0, 8), value: 10000 },
{ date: new Date(2019, 0, 13), value: 49213 },
{ date: new Date(2019, 0, 17), value: 51213 }
]
},
{
label: "Dataset 2",
data: [
{ date: new Date(2019, 0, 2), value: 0 },
{ date: new Date(2019, 0, 6), value: 57312 },
{ date: new Date(2019, 0, 8), value: 21432 },
{ date: new Date(2019, 0, 15), value: 70323 },
{ date: new Date(2019, 0, 19), value: 21300 }
]
},
{
label: "Dataset 3",
data: [
{ date: new Date(2019, 0, 1), value: 50000 },
{ date: new Date(2019, 0, 5), value: 15000 },
{ date: new Date(2019, 0, 8), value: 20000 },
{ date: new Date(2019, 0, 13), value: 39213 },
{ date: new Date(2019, 0, 17), value: 61213 }
]
},
{
label: "Dataset 4",
data: [
{ date: new Date(2019, 0, 2), value: 10 },
{ date: new Date(2019, 0, 6), value: 37312 },
{ date: new Date(2019, 0, 8), value: 51432 },
{ date: new Date(2019, 0, 15), value: 40323 },
{ date: new Date(2019, 0, 19), value: 31300 }
]
}
]
};
TO BE:
The proposed data format is basically the same shape of a CSV converted to JSON.
const lineData_TO_BE = [
{ group: "Dataset 1", y: 32100, x: "Qty" },
{ group: "Dataset 1", y: 23500, x: "More" },
{ group: "Dataset 1", y: 53100, x: "Sold" },
{ group: "Dataset 1", y: 42300, x: "Restocking" },
{ group: "Dataset 1", y: 12300, x: "Misc" },
{ group: "Dataset 2", y: 34200, x: "Qty" },
{ group: "Dataset 2", y: 53200, x: "More" },
{ group: "Dataset 2", y: 42300, x: "Sold" },
{ group: "Dataset 2", y: 21400, x: "Restocking" },
{ group: "Dataset 2", y: 0, x: "Misc" },
{ group: "Dataset 3 long name", y: 41200, x: "Qty" },
{ group: "Dataset 3 long name", y: 23400, x: "More" },
{ group: "Dataset 3 long name", y: 34210, x: "Sold" },
{ group: "Dataset 3 long name", y: 1400, x: "Restocking" },
{ group: "Dataset 3 long name", y: 42100, x: "Misc" },
{ group: "Dataset 4 long name", y: 22000, x: "Qty" },
{ group: "Dataset 4 long name", y: 1200, x: "More" },
{ group: "Dataset 4 long name", y: 9000, x: "Sold" },
{ group: "Dataset 4 long name", y: 24000, x: "Restocking" },
{ group: "Dataset 4 long name", y: 3000, x: "Misc" },
{ group: "Dataset 5 long name", y: 2412, x: "Qty" },
{ group: "Dataset 5 long name", y: 30000, x: "More" },
{ group: "Dataset 5 long name", y: 10000, x: "Sold" },
{ group: "Dataset 5 long name", y: 5000, x: "Restocking" },
{ group: "Dataset 5 long name", y: 31000, x: "Misc" },
{ group: "Dataset 6 long name", y: 0, x: "Qty" },
{ group: "Dataset 6 long name", y: 20000, x: "More" },
{ group: "Dataset 6 long name", y: 40000, x: "Sold" },
{ group: "Dataset 6 long name", y: 60000, x: "Restocking" },
{ group: "Dataset 6 long name", y: 80000, x: "Misc" },
];
const lineTimeSeriesData_TO_BE = [
{ group: "Dataset 1", x: new Date(2019, 0, 1), y: 10000 },
{ group: "Dataset 1", x: new Date(2019, 0, 5), y: 65000 },
{ group: "Dataset 1", x: new Date(2019, 0, 8), y: 10000 },
{ group: "Dataset 1", x: new Date(2019, 0, 13), y: 49213 },
{ group: "Dataset 1", x: new Date(2019, 0, 17), y: 51213 },
{ group: "Dataset 2", x: new Date(2019, 0, 2), y: 0 },
{ group: "Dataset 2", x: new Date(2019, 0, 6), y: 57312 },
{ group: "Dataset 2", x: new Date(2019, 0, 8), y: 21432 },
{ group: "Dataset 2", x: new Date(2019, 0, 15), y: 70323 },
{ group: "Dataset 2", x: new Date(2019, 0, 19), y: 21300 },
{ group: "Dataset 3", x: new Date(2019, 0, 1), y: 50000 },
{ group: "Dataset 3", x: new Date(2019, 0, 5), y: 15000 },
{ group: "Dataset 3", x: new Date(2019, 0, 8), y: 20000 },
{ group: "Dataset 3", x: new Date(2019, 0, 13), y: 39213 },
{ group: "Dataset 3", x: new Date(2019, 0, 17), y: 61213 },
{ group: "Dataset 4", x: new Date(2019, 0, 2), y: 10 },
{ group: "Dataset 4", x: new Date(2019, 0, 6), y: 37312 },
{ group: "Dataset 4", x: new Date(2019, 0, 8), y: 51432 },
{ group: "Dataset 4", x: new Date(2019, 0, 15), y: 40323 },
{ group: "Dataset 4", x: new Date(2019, 0, 19), y: 31300 }
]
Problem 1: Column names
The two examples above use as columns, or object keys, the strings group
x
y
, and some (y2
, color
) could be added in the future.
But as @theiliad pointed out, the Carbon Charts are more generic and they don’t rely on the fact that the horizontal axis is called X and the vertical Y.
Another option could be to let the user specify the “axis - column” association as a configuration option.
This would mean the dataset could have columns [date, country, value]
and the configuration option could assign them:
axesColumns: {
bottom: 'date',
left: 'value',
group: 'country',
}
Problem 2: Backwards compatibility
We think that backward compatibility could be obtained by supporting both formats for a limited period of time, transforming one into the another.
Many things are still to be defined and we have many ideas, we are curious if anyone has any thoughts!
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:5 (5 by maintainers)
Top GitHub Comments
@theiliad I’ve reported what you say under Problem 1, sorry if I wasn’t clear! About the “non significant advantage”, you may have a point. However, let me add to the pros the simplicity to add a three-dimensions chart such as the Heatmap, or the continuous-color-coded Scatterplot. I think the nested data structure for those could be pretty complicated.
@cal-smith thanks! I agree with you that we could also make function-based accessors, it’s a very common pattern in
lodash
so it definitely makes sense. The only advantage I can think of using strings over functions is that they are JSON-serializable, but I don’t know if that’s something of importance for you.I definitly prefer the tabular format over the current format - to my mind it’s significantly clearer.
I agree with
However we can sort that with some mapping options/functions … something to the effect of:
so the type would look like
map: string | (data) => any
. Should we need grouping or other values to order by, we can use the same type signature.It should be fairly easy to write a function to map between the old format and a tabular format … worst case it may also be feasible to support both.