question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Allow new attributes to be added to DataSets

See original GitHub issue

Description

I have certain attributes to track within my datasets and have created custom DataSets to get around this issue. Now that hooks are out most of my reasons for custom DataSets are gone, and I can achieve the same thing with an after_node_run hook, but I still cannot attach custom attributes to datasets.

Use Case 1 (can I share this dataset)

I would like to attach things like confidentiality to the dataset so that team members can easily know who they can share a dataset with by looking at an attribute on the dataset. Ideally, I would like to add these to the catalog.

Use Case 2 (can I delete this sub_pipeliene)

I would also like to be able to check the pipeline health in CI, one thing that I would like to look for is dangling edges that are useless. Sometimes during refactoring we switch to a new section of the pipeline, the old one gets disconnected, never removed, and now we wonder if anyone is using that output. It would have been nice to have CI tell us that we need to mark that dataset as a final output or remove the section of pipeline.

Possible Implementation

cars:
  type: pandas.CSVDataSet
  filepath: data/01_raw/company/cars.csv
  attributes: # 👈 this is the proposed feature, not currently in the framework
    is_output: true
    confidentiality: public

The AbstractDataset’s would need to accept the attributes keyword, then attach the attributes to each instance.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:4
  • Comments:10 (7 by maintainers)

github_iconTop GitHub Comments

2reactions
WaylonWalkercommented, Apr 14, 2021

Still dreaming of being able to add additional attributes to datasets so that I can access them in hooks. Is this something the kedro team is interested in allowing?

1reaction
tdrobbincommented, Jul 1, 2020
Read more comments on GitHub >

github_iconTop Results From Across the Web

Allow new attributes to be added to DataSets #400 - GitHub
Description I have certain attributes to track within my datasets and have created custom DataSets to get around this issue.
Read more >
Create dataset attributes - Manage data with attributes - Lightico
Adding a dataset attribute involves the following 3 steps: Create a dataset; Upload a dataset; Add a dataset attribute. Note: For background ...
Read more >
Include Attributes in New Datasets - Salesforce Help
Prepare the attributes in your CSV file. · Click Datasets, then click Create. · Select any reports and queries that you want to...
Read more >
Add Attribute Rule (Data Management)—ArcGIS Pro
Attribute rules are user-defined rules that can be added to a dataset to enhance the editing experience and help enforce data integrity.
Read more >
Using data attributes - Learn web development | MDN
To get a data attribute through the dataset object, get the property by the part of the attribute name after data- (note that...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found