question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Understanding PipelineML and pipeline_ml

See original GitHub issue

Hi @Galileo-Galilei. As I mentioned in other issue, I’m working currently with integrating my training and inference pipelines with MLPipeline. Unfortunately I’m confused with handling inputs and outputs, I can’t wrap my head around it.

Context

My training pipeline is built from three other pipelines: de_pipeline (data engineering), fe_pipeline (feature engineering) and md_pipeline (training aka. modeling).

My inference pipeline is buit from the same pipelines but with predict argument which change their behavior (they’re using previously saved models for imputer and prediction.

In my current implementation it looks like this:

     de_pipeline_predict = pipeline(
         de.create_pipeline(predict=True),  # type: ignore
         inputs={"remote_raw": "remote_new", "imputer": "imputer"},
         namespace="new",
     )
     fe_pipeline_predict = pipeline(
         fe.create_pipeline(predict=True),  # type: ignore
         namespace="new",
     )

     # `new_preds` output would be mapped to `new.new_preds` because of
     # namespace usage, so we use map `new_preds` to `new_preds` to retain the
     # name and keep catalog clean.
     md_pipeline_predict = pipeline(
         md.create_pipeline(predict=True),  # type: ignore
         inputs={"lgbm": "lgbm"},
         outputs={"new_preds": "new_preds"},
         namespace="new",

My pipelines also getting as input parameters, obtained from kedro configuration (by that I mean conf/base/parameters.yaml).

When I’m trying to glue them together with:

     train_pipeline = de_pipeline + fe_pipeline + md_pipeline
     predict_pipeline = de_pipeline_predict + fe_pipeline_predict + md_pipeline_predict

     training = pipeline_ml(
         training=train_pipeline,
         inference=predict_pipeline,
     )

and running my training pipeline I’m getting:

kedro_mlflow.pipeline.pipeline_ml.KedroMlflowPipelineMLInputsError:
        The following inputs are free for the inference pipeline:
        - lgbm
     - remote_new
     - imputer
     - params:data_engineering
     - params:target.
        Only one free input is allowed.
        Please make sure that 'inference' pipeline inputs are 'training' pipeline outputs,
        except one.

I’m understand the issue here, but I don’t know how to proceed with that (“un-free” inputs which should be obtained (automatically?) using Kedro features). I would be glad for any tips.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:11 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
laurids-reichardtcommented, Jul 20, 2020

I converted the scikit-learn classifier pipeline to a kedro pipeline as well: https://github.com/laurids-reichardt/kedro-examples/blob/master/text-classification/docs/kedro-pipeline.svg

0reactions
Galileo-Galileicommented, Feb 21, 2021
Read more comments on GitHub >

github_iconTop Results From Across the Web

CI/CD pipelines explained: Everything you need to know
The CI/CD pipeline combines continuous integration, delivery and deployment into four major phases: source, build, test and deploy. Each phase ...
Read more >
What is a CI/CD pipeline? - Red Hat
CI/CD pipelines are a practice focused on improving software delivery throughout the software development life cycle via automation.
Read more >
Deployment Pipelines (CI/CD) in Software Engineering
On any Software Engineering team, a pipeline is a set of automated processes that allow developers and DevOps professionals to reliably and ...
Read more >
Key concepts for new Azure Pipelines users - Microsoft Learn
A pipeline defines the continuous integration and deployment process for your app. It's made up of one or more stages. It can be...
Read more >
Understanding the DevOps Pipeline & How to Build One
A DevOps pipeline is a set of practices, tools, and automated processes used by development (Dev) and operations (Ops) teams to build and...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found