question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Proposal: Dataset in Tabular Format

See original GitHub issue

Hi, this is a proposal for @theiliad, about a change in the shape of the data object given as chart input.

This is based on a pattern we often use at Accurat, which is to always start from non-nested datasets.

// Dataset with columns `x`, `y`, `category`

const tabularDataset = [
  { x: new Date(2020, 1, 1), y: 32100, category: 'A' },
  { x: new Date(2020, 1, 2), y: 23500, category: 'A' },
  { x: new Date(2020, 1, 3), y: 53100, category: 'A' },
  { x: new Date(2020, 1, 4), y: 42300, category: 'A' },
  { x: new Date(2020, 1, 5), y: 12300, category: 'A' },
]

We find that tabular datasets are, for the fact that they avoid nesting:

  • easy to generate for the developer using the chart, very often being that a CSV import
  • easy to manipulate from the chart internals
  • easy to debug, being each datapoint/datasymbol associated to a single dataset row/object
  • pretty simple to transform into nested format (lodash.groupBy or d3.nest)
  • usable for different charts, without changing the shape. They only depend on the semantic meaning of the chart.

Follows two cases of conversions from old to new proposed format.

AS IS:

https://github.com/carbon-design-system/carbon-charts/blob/master/packages/core/demo/demo-data/line.ts#L81-L185

const lineData_AS_IS = {
  labels: ["Qty", "More", "Sold", "Restocking", "Misc"],
  datasets: [
    {
      label: "Dataset 1",
      data: [32100, 23500, 53100, 42300, 12300]
    },
    {
      label: "Dataset 2",
      data: [34200, 53200, 42300, 21400, 0]
    },
    {
      label: "Dataset 3 long name",
      data: [41200, 23400, 34210, 1400, 42100]
    },
    {
      label: "Dataset 4 long name",
      data: [22000, 1200, 9000, 24000, 3000]
    },
    {
      label: "Dataset 5 long name",
      data: [2412, 30000, 10000, 5000, 31000]
    },
    {
      label: "Dataset 6 long name",
      data: [0, 20000, 40000, 60000, 80000]
    }
  ]
};

const lineTimeSeriesData_AS_IS = {
  datasets: [
    {
      label: "Dataset 1",
      data: [
        { date: new Date(2019, 0, 1), value: 10000 },
        { date: new Date(2019, 0, 5), value: 65000 },
        { date: new Date(2019, 0, 8), value: 10000 },
        { date: new Date(2019, 0, 13), value: 49213 },
        { date: new Date(2019, 0, 17), value: 51213 }
      ]
    },
    {
      label: "Dataset 2",
      data: [
        { date: new Date(2019, 0, 2), value: 0 },
        { date: new Date(2019, 0, 6), value: 57312 },
        { date: new Date(2019, 0, 8), value: 21432 },
        { date: new Date(2019, 0, 15), value: 70323 },
        { date: new Date(2019, 0, 19), value: 21300 }
      ]
    },
    {
      label: "Dataset 3",
      data: [
        { date: new Date(2019, 0, 1), value: 50000 },
        { date: new Date(2019, 0, 5), value: 15000 },
        { date: new Date(2019, 0, 8), value: 20000 },
        { date: new Date(2019, 0, 13), value: 39213 },
        { date: new Date(2019, 0, 17), value: 61213 }
      ]
    },
    {
      label: "Dataset 4",
      data: [
        { date: new Date(2019, 0, 2), value: 10 },
        { date: new Date(2019, 0, 6), value: 37312 },
        { date: new Date(2019, 0, 8), value: 51432 },
        { date: new Date(2019, 0, 15), value: 40323 },
        { date: new Date(2019, 0, 19), value: 31300 }
      ]
    }
  ]
};

TO BE:

The proposed data format is basically the same shape of a CSV converted to JSON.

const lineData_TO_BE = [
  { group: "Dataset 1", y: 32100, x: "Qty" },
  { group: "Dataset 1", y: 23500, x: "More" },
  { group: "Dataset 1", y: 53100, x: "Sold" },
  { group: "Dataset 1", y: 42300, x: "Restocking" },
  { group: "Dataset 1", y: 12300, x: "Misc" },
  
  { group: "Dataset 2", y: 34200, x: "Qty" },
  { group: "Dataset 2", y: 53200, x: "More" },
  { group: "Dataset 2", y: 42300, x: "Sold" },
  { group: "Dataset 2", y: 21400, x: "Restocking" },
  { group: "Dataset 2", y: 0, x: "Misc" },

  { group: "Dataset 3 long name", y: 41200, x: "Qty" },
  { group: "Dataset 3 long name", y: 23400, x: "More" },
  { group: "Dataset 3 long name", y: 34210, x: "Sold" },
  { group: "Dataset 3 long name", y: 1400, x: "Restocking" },
  { group: "Dataset 3 long name", y: 42100, x: "Misc" },

  { group: "Dataset 4 long name", y: 22000, x: "Qty" },
  { group: "Dataset 4 long name", y: 1200, x: "More" },
  { group: "Dataset 4 long name", y: 9000, x: "Sold" },
  { group: "Dataset 4 long name", y: 24000, x: "Restocking" },
  { group: "Dataset 4 long name", y: 3000, x: "Misc" },

  { group: "Dataset 5 long name", y: 2412, x: "Qty" },
  { group: "Dataset 5 long name", y: 30000, x: "More" },
  { group: "Dataset 5 long name", y: 10000, x: "Sold" },
  { group: "Dataset 5 long name", y: 5000, x: "Restocking" },
  { group: "Dataset 5 long name", y: 31000, x: "Misc" },

  { group: "Dataset 6 long name", y: 0, x: "Qty" },
  { group: "Dataset 6 long name", y: 20000, x: "More" },
  { group: "Dataset 6 long name", y: 40000, x: "Sold" },
  { group: "Dataset 6 long name", y: 60000, x: "Restocking" },
  { group: "Dataset 6 long name", y: 80000, x: "Misc" },
];


const lineTimeSeriesData_TO_BE = [
  { group: "Dataset 1", x: new Date(2019, 0, 1), y: 10000 },
  { group: "Dataset 1", x: new Date(2019, 0, 5), y: 65000 },
  { group: "Dataset 1", x: new Date(2019, 0, 8), y: 10000 },
  { group: "Dataset 1", x: new Date(2019, 0, 13), y: 49213 },
  { group: "Dataset 1", x: new Date(2019, 0, 17), y: 51213 },
  
  { group: "Dataset 2", x: new Date(2019, 0, 2), y: 0 },
  { group: "Dataset 2", x: new Date(2019, 0, 6), y: 57312 },
  { group: "Dataset 2", x: new Date(2019, 0, 8), y: 21432 },
  { group: "Dataset 2", x: new Date(2019, 0, 15), y: 70323 },
  { group: "Dataset 2", x: new Date(2019, 0, 19), y: 21300 },
  
  { group: "Dataset 3", x: new Date(2019, 0, 1), y: 50000 },
  { group: "Dataset 3", x: new Date(2019, 0, 5), y: 15000 },
  { group: "Dataset 3", x: new Date(2019, 0, 8), y: 20000 },
  { group: "Dataset 3", x: new Date(2019, 0, 13), y: 39213 },
  { group: "Dataset 3", x: new Date(2019, 0, 17), y: 61213 },
  
  { group: "Dataset 4", x: new Date(2019, 0, 2), y: 10 },
  { group: "Dataset 4", x: new Date(2019, 0, 6), y: 37312 },
  { group: "Dataset 4", x: new Date(2019, 0, 8), y: 51432 },
  { group: "Dataset 4", x: new Date(2019, 0, 15), y: 40323 },
  { group: "Dataset 4", x: new Date(2019, 0, 19), y: 31300 }
]

Problem 1: Column names

The two examples above use as columns, or object keys, the strings group x y, and some (y2, color) could be added in the future. But as @theiliad pointed out, the Carbon Charts are more generic and they don’t rely on the fact that the horizontal axis is called X and the vertical Y.

Another option could be to let the user specify the “axis - column” association as a configuration option. This would mean the dataset could have columns [date, country, value] and the configuration option could assign them:

axesColumns: {
  bottom: 'date',
  left: 'value',
  group: 'country',
}

Problem 2: Backwards compatibility

We think that backward compatibility could be obtained by supporting both formats for a limited period of time, transforming one into the another.


Many things are still to be defined and we have many ideas, we are curious if anyone has any thoughts!

(cc @lucafalasco @serenaG @ilariaventurini)

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:1
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
caesarsolcommented, Feb 25, 2020

@theiliad I’ve reported what you say under Problem 1, sorry if I wasn’t clear! About the “non significant advantage”, you may have a point. However, let me add to the pros the simplicity to add a three-dimensions chart such as the Heatmap, or the continuous-color-coded Scatterplot. I think the nested data structure for those could be pretty complicated.

@cal-smith thanks! I agree with you that we could also make function-based accessors, it’s a very common pattern in lodash so it definitely makes sense. The only advantage I can think of using strings over functions is that they are JSON-serializable, but I don’t know if that’s something of importance for you.

1reaction
cal-smithcommented, Feb 25, 2020

I definitly prefer the tabular format over the current format - to my mind it’s significantly clearer.

I agree with

I’m not a fan of the x, y, x2, y2 idea since it doesn’t clearly define where the data would land (is y left and y2 right? what if we have an RTL chart?)

However we can sort that with some mapping options/functions … something to the effect of:


data = [
    { supplier: 'foo inc', y: 66 }
    { supplier: 'bar corp', y: 25}
    // ...
];

// in the options config
axis: {
    bottom: {
        map: (data) => data.supplier,
        // ...
    },
    left: {
        map: 'y',
        // ...
    }
}

so the type would look like map: string | (data) => any. Should we need grouping or other values to order by, we can use the same type signature.

It should be fairly easy to write a function to map between the old format and a tabular format … worst case it may also be feasible to support both.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Proposal - CSE 163 - Washington
Propose a data analysis project to the course staff. This can be almost anything that you choose. You might select a project from...
Read more >
Proposal Types and Format Guide - IODP
Proposal Types and Format Guide ... This table is a quick reference of required components of the different types of IODP proposals. Complete...
Read more >
Dataset for statmodels: design proposal
That is, a dataset is not only data, but also some meta-data. The goal of this proposal is to propose common practices for...
Read more >
Image-based table recognition: data, model, and evaluation
To facilitate image-based table recognition with deep learning, we develop the largest publicly available table recognition dataset PubTabNet (https://github.
Read more >
Effective Use of Tables and Figures in Research Papers - Enago
Tables and figures in research paper help effective data presentation and make it easier for readers to understand research data.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found