Understanding PipelineML and pipeline_ml
See original GitHub issueHi @Galileo-Galilei. As I mentioned in other issue, I’m working currently with integrating my training and inference pipelines with MLPipeline. Unfortunately I’m confused with handling inputs and outputs, I can’t wrap my head around it.
Context
My training pipeline is built from three other pipelines: de_pipeline
(data engineering), fe_pipeline
(feature engineering) and md_pipeline
(training aka. modeling).
My inference pipeline is buit from the same pipelines but with predict argument which change their behavior (they’re using previously saved models for imputer and prediction.
In my current implementation it looks like this:
de_pipeline_predict = pipeline(
de.create_pipeline(predict=True), # type: ignore
inputs={"remote_raw": "remote_new", "imputer": "imputer"},
namespace="new",
)
fe_pipeline_predict = pipeline(
fe.create_pipeline(predict=True), # type: ignore
namespace="new",
)
# `new_preds` output would be mapped to `new.new_preds` because of
# namespace usage, so we use map `new_preds` to `new_preds` to retain the
# name and keep catalog clean.
md_pipeline_predict = pipeline(
md.create_pipeline(predict=True), # type: ignore
inputs={"lgbm": "lgbm"},
outputs={"new_preds": "new_preds"},
namespace="new",
My pipelines also getting as input parameters, obtained from kedro configuration (by that I mean conf/base/parameters.yaml
).
When I’m trying to glue them together with:
train_pipeline = de_pipeline + fe_pipeline + md_pipeline
predict_pipeline = de_pipeline_predict + fe_pipeline_predict + md_pipeline_predict
training = pipeline_ml(
training=train_pipeline,
inference=predict_pipeline,
)
and running my training pipeline I’m getting:
kedro_mlflow.pipeline.pipeline_ml.KedroMlflowPipelineMLInputsError:
The following inputs are free for the inference pipeline:
- lgbm
- remote_new
- imputer
- params:data_engineering
- params:target.
Only one free input is allowed.
Please make sure that 'inference' pipeline inputs are 'training' pipeline outputs,
except one.
I’m understand the issue here, but I don’t know how to proceed with that (“un-free” inputs which should be obtained (automatically?) using Kedro features). I would be glad for any tips.
Issue Analytics
- State:
- Created 3 years ago
- Comments:11 (6 by maintainers)
I converted the scikit-learn classifier pipeline to a kedro pipeline as well: https://github.com/laurids-reichardt/kedro-examples/blob/master/text-classification/docs/kedro-pipeline.svg
This issue is closed since :
Feel free to reopen if needed.