question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Modular Pipelines

See original GitHub issue

Description

We’ve seen something incredible evolve through continued use of Kedro. Teams around the world are starting to use Kedro to create stores of reusable pipelines.

Last year, we introduced basic support for Modular Pipelines and this year we’re doubling down on this area.

In our world, a modular pipeline is a series of generalised and connected Python functions that have inputs and outputs. A modular pipeline:

  • Can be easily added to an existing or new Kedro project
  • Has virtually no learning curve, if you know how to use Kedro
  • Can be testable by itself, to ensure high quality code
  • Does not have a Kedro version dependency (related to #219)

Context

The final evolution of Modular Pipelines will see an ecosystem of reusable pipelines. However, for now we want to focus on allowing users to easily add pre-assembled pipelines to an existing or new Kedro project and export their own pre-assembled pipelines.

Next steps

Give us feedback if you’ve tried Modular Pipelines and the basic support we have for using them, like pipeline.transform(). Modular Pipelines also have implications for kedro-viz and we can’t wait to show you what we have in mind for this.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:2
  • Comments:9 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
lorenabalancommented, Apr 21, 2020

Also as an update (for whoever is interested), we’re looking to include this feature in the next breaking release (0.16.0). We’ve merged pipeline() helper here, a slightly cleaner alternative to Pipeline.transform(), which we’re dropping, to map inputs/outputs/parameters names, or namespace (prefix) datasets and node names. There’s work being done on the CLI side to help with the workflow of creating/working with modular pipelines. This includes generating a new pipeline, packaging an existing pipeline, and pulling an existing pipeline from somewhere, integrating it into a Kedro project.

1reaction
EigenJTcommented, Apr 15, 2020

@yetudada Not sure if this is where you’d like the feedback, but this is essentially how we’ve been building all our pipelines. One of the sticking points I’ve found is how to write tests that ensure the pipelines work within a kedro context. What I’ve resorted to doing is writing tests that create temporary kedro projects, then test the pipelines within them.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Modular pipelines — Kedro 0.18.4 documentation
Modular pipelines allow you to instantiate pipelines multiple times, while allowing the user to override inputs/outputs/parameters. They are reusable within the ...
Read more >
MPL - Modular Pipeline Library - Jenkins
The modular pipeline library (MPL) we created is a highly-flexible shared library for a Jenkins Pipeline that enables easy sharing of best ...
Read more >
PynPoint: a modular pipeline architecture for processing and ...
The architecture of PynPoint has a modular design which sepa- rates the common data-handling functionalities that are required for all reduction steps from...
Read more >
Towards a Modular Future: Reimagining and Rebuilding ...
Kedro is an open-source framework for creating portable pipelines through modular data science code, and provides a powerful interactive visualisation tool ...
Read more >
Building Modular Pipelines in Azure Data Factory ... - Pinterest
May 31, 2020 - Azure Data Factory pipelines are powerful and can be complex. In this post, I share some lessons and practices...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found