Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Understanding PipelineML and pipeline_ml

See original GitHub issue

Hi @Galileo-Galilei. As I mentioned in other issue, I’m working currently with integrating my training and inference pipelines with MLPipeline. Unfortunately I’m confused with handling inputs and outputs, I can’t wrap my head around it.

Context

My training pipeline is built from three other pipelines: de_pipeline (data engineering), fe_pipeline (feature engineering) and md_pipeline (training aka. modeling).

My inference pipeline is buit from the same pipelines but with predict argument which change their behavior (they’re using previously saved models for imputer and prediction.

In my current implementation it looks like this:

     de_pipeline_predict = pipeline(
         de.create_pipeline(predict=True),  # type: ignore
         inputs={"remote_raw": "remote_new", "imputer": "imputer"},
         namespace="new",
     )
     fe_pipeline_predict = pipeline(
         fe.create_pipeline(predict=True),  # type: ignore
         namespace="new",
     )

     # `new_preds` output would be mapped to `new.new_preds` because of
     # namespace usage, so we use map `new_preds` to `new_preds` to retain the
     # name and keep catalog clean.
     md_pipeline_predict = pipeline(
         md.create_pipeline(predict=True),  # type: ignore
         inputs={"lgbm": "lgbm"},
         outputs={"new_preds": "new_preds"},
         namespace="new",

My pipelines also getting as input parameters, obtained from kedro configuration (by that I mean conf/base/parameters.yaml).

When I’m trying to glue them together with:

     train_pipeline = de_pipeline + fe_pipeline + md_pipeline
     predict_pipeline = de_pipeline_predict + fe_pipeline_predict + md_pipeline_predict

     training = pipeline_ml(
         training=train_pipeline,
         inference=predict_pipeline,
     )

and running my training pipeline I’m getting:

kedro_mlflow.pipeline.pipeline_ml.KedroMlflowPipelineMLInputsError:
        The following inputs are free for the inference pipeline:
        - lgbm
     - remote_new
     - imputer
     - params:data_engineering
     - params:target.
        Only one free input is allowed.
        Please make sure that 'inference' pipeline inputs are 'training' pipeline outputs,
        except one.

I’m understand the issue here, but I don’t know how to proceed with that (“un-free” inputs which should be obtained (automatically?) using Kedro features). I would be glad for any tips.