[Feature Request] Support for Hydra in Kedro
See original GitHub issueDescription
Hydra is a framework for elegantly configuring complex applications. It is used to create a hierarchical configuration by splitting it in different yaml files, making it easier to organise. Project description here : https://github.com/facebookresearch/hydra
When trying to use Hydra via the hydra.main() decorator applied on register_pipelines(), an error occurs.
Context
Having Kedro and Hydra working together would make it easier to maintain complex pipelines.
Reproducing issue
python version: 3.8.12 kedro version: 0.17.7 hydra version: 1.1.1
The bug appears when trying to set hydra.main() decorator on register_pipelines(). This decorator is used to build an Omegaconf config from the /conf directory. Steps to reproduce :
- Setup the iris_dataset toy project
- Add the files required by Hydra in the conf folder (config.yaml and base/master.yaml):
- conf/config.yaml:
defaults: - base: master - conf/base/master.yaml:
defaults: - ./catalog - ./logging - ./parameters
- conf/config.yaml:
- Rewrite existing files extension (yml->yaml)
- Add the hydra.main decorator in src/[package_name]/pipeline_registry.py :
import hydra
@hydra.main(config_path="../../conf", config_name="config")
def register_pipelines(cfg: DictConfig) -> Dict[str, Pipeline]:
"""Register the project's pipelines.
Returns:
A mapping from a pipeline name to a ``Pipeline`` object.
"""
data_engineering_pipeline = de.create_pipeline()
data_science_pipeline = ds.create_pipeline()
return {
"de": data_engineering_pipeline,
"ds": data_science_pipeline,
"__default__": data_engineering_pipeline + data_science_pipeline,
}
This will result in the following error:
Primary config module 'get_started.conf' not found.
Check that it's correct and contains an __init__.py file
note: get_started is the name of the package in /src
Cause of the issue
After some digging, it appears that the configuration path resolved by hydra.main does not exist. The following info is obtained by running in debug mode, and setting a breakpoint on the first line of the function ensure_main_config_source_available(). Full path: hydra/_internal/config_loader_impl.py/ConfigLoaderImpl.ensure_main_config_source_available()
- When the bug appears, calling
self.get_sources()while being inConfigLoaderImpl.ensure_main_config_source_availablereturns this :[provider=hydra, path=pkg://hydra.conf, provider=main, path=pkg://conf, provider=schema, path=structured://] - It should actually be this :
[provider=hydra, path=pkg://hydra.conf, provider=main, path=file:///PATH_TO_PROJECT//conf, provider=schema, path=structured://]It appears that Hydra doesn’t know how to getfile:///PATH_TO_PROJECT, and replaces it bypkg://
Possible Implementation
Not really sure how to solve and which library should be adapted to correct this bug, so I wrote a similar post on Hydra’s issues tracker.
Hydra requires that the script is launched by calling it manually in the terminal, and I don’t know what happens when executing kedro run but I guess it comes from somewhere here.
Possible Alternatives
Right now i’m using a workaround by generating the conf via initialize() and compose() :
from hydra import compose, initialize
def register_pipelines() -> Dict[str, Pipeline]:
"""Register the project's pipelines.
Returns:
A mapping from a pipeline name to a ``Pipeline`` object.
"""
initialize(config_path="../../conf")
cfg = compose(config_name="config")
data_engineering_pipeline = de.create_pipeline()
data_science_pipeline = ds.create_pipeline()
return {
"de": data_engineering_pipeline,
"ds": data_science_pipeline,
"__default__": data_engineering_pipeline + data_science_pipeline,
}
Follow up question
This also raises the question on how to do config overrides from the command line, a feature of Hydra possible when the user calls the script himself from command line. I guess it would be possible via the --config argument of kedro run, but I haven’t tested it yet.
tl;dr: hydra.main() called in a unusual way, leading to the impossibility for Hydra to find the config folder.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:4
- Comments:11 (5 by maintainers)

Top Related StackOverflow Question
Hi @noklam sorry for the delay.
This is true, using hydra via CLI allows to override parameters at runtime, or launch it in multirun mode (one command fires all configs). Also the CLI mode creates a new output folder for each run, which proves useful in multirun. The compose API can only be used to create the config from the yaml files. In both cases, the config is accessible in the code directly, for example accessible from the
register_pipeline()function. Here since hydra does not have direct access to the CLI, I made some adapter([repo(https://github.com/neltacigreb/kedroXhydra) ) to be able to try to test the 2 packages together.Correct me if needed but I feel that the main difference is that kedro aims at simplicity in the config directory, while hydra encourage more complex config folder structures, so to make use of the override mechanism. They’re similar on some subjects too (multirun, dynamic pipelines, overrides), where some are already provided in the kedro config
As I continue using the two packages, I’ll focus on 2 features that could be a match in my opinion:
When I find some time i’ll package my findings in a plugin 😃 until then if you think of some features that could be used into kedro I’d be glad to try them as well
For the compose API, that’s it exactly. In the repo I mentioned, their are 2 hydra decorators adapters.
pipelines_registry()My plan to make the multirun usable, is to generate many kedro pipelines with different configurations, namespace them and assemble them in a big final pipeline.
I didn’t know about the multirun hook’s I’ll look into that first to see if it fits my app