Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Allow new attributes to be added to DataSets

See original GitHub issue

Description

I have certain attributes to track within my datasets and have created custom DataSets to get around this issue. Now that hooks are out most of my reasons for custom DataSets are gone, and I can achieve the same thing with an after_node_run hook, but I still cannot attach custom attributes to datasets.

Use Case 1 (can I share this dataset)

I would like to attach things like confidentiality to the dataset so that team members can easily know who they can share a dataset with by looking at an attribute on the dataset. Ideally, I would like to add these to the catalog.

Use Case 2 (can I delete this sub_pipeliene)

I would also like to be able to check the pipeline health in CI, one thing that I would like to look for is dangling edges that are useless. Sometimes during refactoring we switch to a new section of the pipeline, the old one gets disconnected, never removed, and now we wonder if anyone is using that output. It would have been nice to have CI tell us that we need to mark that dataset as a final output or remove the section of pipeline.

Possible Implementation

cars:
  type: pandas.CSVDataSet
  filepath: data/01_raw/company/cars.csv
  attributes: # 👈 this is the proposed feature, not currently in the framework
    is_output: true
    confidentiality: public

The AbstractDataset’s would need to accept the attributes keyword, then attach the attributes to each instance.

Issue Analytics

State:
Created 3 years ago
Reactions:4
Comments:10 (7 by maintainers)

Top GitHub Comments

2reactions

WaylonWalkercommented, Apr 14, 2021

Still dreaming of being able to add additional attributes to datasets so that I can access them in hooks. Is this something the kedro team is interested in allowing?

1reaction

tdrobbincommented, Jul 1, 2020