question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Assess `OmegaConf` as replacement for `anyconfig`

See original GitHub issue

Introduction

As part of the work on improving configuration in Kedro we should assess alternatives for anyconfig. anyconfig was initially chosen because it supports reading configuration of lots of different types. In practice, most users use yml files and so we might be able to use an alternative library that offers better functionality for yml configuration.

Task

Assess the following alternatives ordered by preference:

  1. OmegaConf
  2. Hydra (specifically the compose API)
  3. DynaConf this is the least preferred, but already used in Kedro.

Issue Analytics

  • State:closed
  • Created a year ago
  • Reactions:2
  • Comments:12 (10 by maintainers)

github_iconTop GitHub Comments

7reactions
pierrejedencommented, Jul 1, 2022

First: thanks for your great work Kedro team!

We are just about to migrate a project making heavy use of OmegaConf to Kedro (if successful we’ll adopt kedro as a project standard) and was just testing out integration of the two. So +1 for this!

I wanted to describe our use case of OmegaConf, since we seem to have more-complex-than-average config requirements, where OmegaConf has served us well (and we have considered adopting hydra).

  1. We expose somewhat complex configuration to users, for example:

    • an arbitrary-length list of data sources of some given types, with their respective options.

    • User can select from a set of use cases, corresponding to different chunks of business logic, each with their own options (including additional datasources).

  2. For the backend we have exposed most of the parameters of the business logic as well as parts of the control flow:

    • Several chained models / tasks with their respective parameters, including different versions of these. This corresponds to namespaced sub-pipelines, with their own sub-configs.

    • Open to e.g., data engineering ad-hoc solutions or customisations, by overriding the default datasource pointers

  3. This far we have opted to use the “environments” directories to denote different user-setup configs (with their respective catalogs). Kedro’s configuration AFAIU is a bit “one-dimensional” for lack of better word; the order of precedence nicely covers the running environments (as in local, experiment A, B, …, staging, production), but I have a feeling we have occupied this dimension encoding the user-setup. Not sure how this will play with the other aspects (local, experiment) that we wish to modify independently, but I believe Hydra could handle this nicely.

The features we use in OmegaConf that we would like to keep when moving to Kedro:

  1. Can’t help putting this here: attribute access is very neat 😃

  2. OmegaConf supports typed configs in a nice way (what they call structured configs). We use this much as the conf/base in Kedro, with the config schema defined as nested dataclasses. Main benefits of using the structured config:

    • Python-style typing, throwing hard errors

    • The sub-config dataclasses are used as type hints of the “orchestrating functions” (correpsonding to Kedro pipelines)

    • Optional/required fields and default values are defined in the dataclass

    • OmegaConf have special types for string and value interpolation, which means that the logic of distributing config values to all places needed can be done in the dataclass using the yaml syntax.

  3. Omegaconf intepolation possibly has some additional features on top of jmsepath, but I’m not sure here:

    • relative paths ${.key_of_this_level}, ${..key_of_parent_level}

    • nesting of interpolation is allowed

  4. We have used custom resolvers, which are very powerful. However, my feeling is that these should be used with caution, since it might hide important logic in some obscure place (and before one knows it, one have a DSL without spec…)

  5. We have looked into Hydra but not adopted it yet, mostly because it is simple to get started with overrides in OmegaConf (for e.g., experimentation and comparison of parameter values or versions), and we haven’t hit the wall here yet. Also AFAUI it enforces a config directory structure which I suspect would become messy in our case.

Nice-to-haves for OmegaConf / Hydra / other

(Note that I’m new to Kedro, so the below might not appreciate existing features or the ideas behind the design choices.)

  1. Support for the omegaconf interpolation features (I guess this is a given)

  2. An interface for “structured configs”, to point to base config dataclass in the config loader.

  3. Possibly with a convention or way to select between different config schemas.

  4. Some flexibility in what parts of the catalog / config that are strict, so it’s not all or nothing:

    • Important production IO datasets might benefit from being part of the structured config
    • Adding / removing e.g. local intermediate datasets will probably be harder to work with if the whole catalog / config schema is strict
    • (The structured config might be out of place in the catalog however, since we have typing of the parameters in the dataset classes)
  5. (Probably a bad idea discarded long ago for good reasons) Possibly allow for single-yaml-file overrides; in development there is a lot of switching between the various yaml-files in the conf directory, and it would be nice to prototype in one place.

  6. and +1 for some iteration and branching abilities (that are not jinja), even if this is a long shot. Like if Hydra’s multi-run feature was part of the config language.

4reactions
DavidRoschewitzQBcommented, Jul 1, 2022

Thanks for the tag @datajoely. Happy to share some of our thoughts and considerations.

There are a few reasons why we chose to use hydra (specifically compose API) in our prototype (some of these might be pros or cons for configuration in kedro):

  • The entry point is is a single .yaml file, which serves as the “root” of all other configuration. Therefore there is no need to search / loop over various files with a pattern. This combination of files is done explicitly in config.
  • The defaults list is neat syntax and allows for importing config from other files, which could then be overriden by the user if desired. This creates a hierarchical config tree.
    • It is possible to then define under which key or location the imports are placed with packaging.
  • Due to way hydra generates a nested config dictionary, what we are treating as the namespace (essentially the location of any key in the tree) is generated by hydra and it can be extracted when, for example, passing to kedro.
  • Dependency injection is supported out of the box.

I’m certain OmegaConf would allow for most of this functionality, but potentially requiring additional logic on top of base OmegaConf. One thing we have not tried is leveraging OmegaConf functionality (e.g. custom resolvers) together with Hydra.

Some peculiarities that are good to be aware of:

  • hydra only recognises only .yaml (not .yml) files
  • hydra/OmegaConf only support value interpolation, not key interpolation (stackoverflow from main contributor)

And lastly one consideration (when comparing e.g. with jinja) is that there is no support for conditionals or looping - not necessarily a dealbreaker, but potentially a limitation.

Do let me know if I can help clarify any points further. Exciting topic!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Usage — python-anyconfig 0.10.0 documentation
anyconfig.MS_REPLACE: Replace all configuration parameter values provided in former config files are simply replaced w/ the ones in later config files.
Read more >
omegaconf.errors.interpolationresolutionerror: valueerror ...
omegaconf.errors. ... The problem is that I can't resolve any hydra: interpolation in any config file. ... HydraConfig.get()), path), 194 replace=True, ...
Read more >
Pydra - Pydantic and Hydra for configuration management of ...
All in all, powered by OmegaConf this is a great tool for config ... While drop-in replacement of pydantic dataclass works with hydra...
Read more >
raw - Hugging Face
src/datamodules/my_datamodule.py from omegaconf import OmegaConf class ... url="https://github.com/ashleve/lightning-hydra-template", # replace with your ...
Read more >
Add `OmegaConf` as replacement for `anyconf` in `ConfigLoader ...
Introduction As part of the work on improving configuration in Kedro we assessed alternatives for `anyconfig`. `an...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found