question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[KED-1224] Connect Parameters and the Data Catalog

See original GitHub issue

Description

I have a list of annotation names that I’d like to be able to pass in to a DataSet constructor, as well as to particular pipeline nodes. There could easily be other cases where a node might need to know what parameter(s) a dataset was loaded with. You can always just duplicate the list in both yaml files, but it’s more ideal to have the parameter specified in only one place, especially it’s some parameter you can play around with.

For a more motivating use case, consider an mp4_file dataset where I have a frame rate I’d like to load the dataset with. So frame_rate is one of the dataset’s arguments, and nodes might need to access the frame_rate.

Possible Implementations

  1. Is it possible to use parameters in the same manner as the credentials.yml file within the catalog.yml file?
    e.g.,
my_dataset.mp4:
  filepath: /some/path.mp4
  parameters: frame_rate
  ...

This would certainly get around the issue, and you can just inject the frame_rate parameter into the catalog entry by name.

  1. Some extra pipeline node syntax like has been used for parameters, a la: "my_dataset.mp4:frame_rate" to connect the frame_rate parameter of a catalog entry my_dataset.mp4 to a node.

  2. The answer I don’t want is that ‘you could just return a dict of metadata with your loaded dataset object’. It’s not too pretty and I don’t want to write every node to accept dictionary objects or tuples, making it awkward to treat them as functions later on.

edit: formatting

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:1
  • Comments:6 (5 by maintainers)

github_iconTop GitHub Comments

3reactions
WaylonWalkercommented, Jun 3, 2020

The updated link was broken again, here is one that worked for me today. 😃 TemplatedConfigLoader

0reactions
DmitriiDeriabinQBcommented, Jul 20, 2020

I’m closing this issue. The suggested solution would be to utilise TemplatedConfigLoader as described above.

Read more comments on GitHub >

github_iconTop Results From Across the Web

[KED-1224] Connect Parameters and the Data Catalog #163
I have a list of annotation names that I'd like to be able to pass in to a DataSet constructor, as well as...
Read more >
The Data Catalog — Kedro 0.18.4 documentation
Data Catalog accepts two different groups of *_args parameters that serve ... Example 13: Loads an SQL table with credentials, a database connection, ......
Read more >
What is Data Catalog? - Google Cloud
For a given project, Data Catalog automatically catalogs the following Google Cloud assets: BigQuery datasets, tables, views. Pub/Sub topics. Dataplex lakes, ...
Read more >
Working with Data Catalog settings on the AWS Glue console
The Data Catalog settings page contains options to set properties for the Data ... Select this check box to encrypt passwords in the...
Read more >
Managing the Data Catalog Connection - Oracle Help Center
In this example, Autonomous Database is connecting to Data Catalog in the uk-london-1 region. The catalog_id parameter uses the Oracle Cloud Identifier (...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found