question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Remove kedro.line_magic entrypoint

See original GitHub issue

This issue has changed quite a bit since it was originally written. Look at the final post here to see what we now need to do.

Original issue here for posterity.


Description

Is your feature request related to a problem? A clear and concise description of what the problem is: “I’m always frustrated when …”

I am the author of several kedro plugins either open souce (kedro-mlflow) or at work. It turns out that most plugins either perform one (or several) of the following tasks:

  • provide CLI commands
  • modify pipeline/catalog/node behaviour with hooks
  • add new datasets

Since Kedro is a data science framework, and since Jupyter is a standard for the exploration phase of a data science project, it seems natural to make Kedro compatible with Jupyter. For now, Kedro offers the following functionnality:

  • setup a bunch of Kedro configuration (bootstraping the project, creating a session and activate it)
  • declare some global variables to be accessible of the shell (catalog, session, context)
  • register custom “line_magic” from plugins

This works well for the core library, but it would be a more pleasant user experience to improve plugin compatibilities with notebooks with the following actions:

  • add documentation on how to create a custom line magic in your plugin, because this feature is undocumented yet. I had to read the code to discover it and I struggled a little to make it work². I can make a PR, maybe by adding a sub-section in the plugin section?
  • add the possibility to execute some functions / push global variables to the notebook session when it is openened via a Kedro CLI command

Context

Why is this change important to you? How would you use it? How can it benefit other users?

It is a common pattern to “setup” your plugin, e.g. instantiate a connection to an external service (kedro-mlflow creates a connection to the mlflow server, I do have a sas plugin which creates a connection to the database…), especially in ther “before pipeline_run” hook.

If I create a custom line_magic (say %reload_kedro_mlflow), it is discovered by kedro and it is accessible inside the notebook or the ipython session. However, I’d like to run it automatically when the notebook is opened via the Kedro CLI instead of forcing the user to make it manually.

Possible Implementation

(Optional) Suggest an idea for implementing the addition or change.

Possible Alternatives

(Optional) Describe any alternative solutions or features you’ve considered.

It is currently possible to bypass Kedro configuration by creating a load_ipython_extension function to execute code when opening the notebook (like here: https://github.com/quantumblacklabs/kedro/blob/fc9d1c5d35981c51a651af467e79e93df065d354/kedro/extras/extensions/ipython.py#L121). However, we cannot access the objects created by reload_kedro (especially the session or the catalog if we want to modify /use them). We need to duplicate almost all the code to make it works, and it prevents combining several plugins.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:1
  • Comments:10 (9 by maintainers)

github_iconTop GitHub Comments

2reactions
AntonyMilneQBcommented, Aug 31, 2022

Notes from technical design session on 31 August:

  • General agreement that the current implementation of the line magic entrypoint is not correct (e.g. needs_local_scope should be on kedro-viz side)
  • Some (@jmholzer) agreed that we should remove the line magic entrypoint; others (@idanov @ankatiyar) had concerns that explicitly needing to do %load_ext kedro_viz is not as beginner-friendly as the existing workflow where %run_viz is registered automatically
  • @noklam did not like the idea of %load_ext kedro_viz itself loading kedro-viz e.g. how do you provide CLI arguments like pipeline? Note there is a flow here that’s possible and consistent with %load_ext kedro though, in which %load_ext kedro_viz executes %run_viz but you can also call %run_viz yourself with arguments
  • @AhdraMeraliQB and @SajidAlamQB also like %run_viz line magic existing, but @SajidAlamQB thinks it’s fine to have the extra %load_ext kedro_viz step as well
  • A more generic IPython entrypoint (and maybe hook with ipython +catalog/context/etc. arguments) would remove the need for the %load_ext step. @AntonyMilneQB (and @Galileo-Galilei from above discussion) think this is overkill for now and could always be added later
1reaction
noklamcommented, Aug 31, 2022

I am slightly in favor of loading %run_viz automatically.

Existing workflows

  1. kedro ipython/jupyter -> %reload_kedro (If you are in local)
  2. %load_ext kedro -> %reload_kedro (If you are in JupyterHub/Databricks)

I would expect 1 is a more common case, so I think most people would only interact with %reload_kedro, and therefore they should interact with %run_viz equivalently. We could also add more line magic in the future (maybe a line magic to save a static output or something related to experiment tracking?), this makes me less prefer overloading the load_ext too much.

Overall I agree the existing implementation is incorrect (arguably it’s a bug, just that no one is using it), but I would go for a lower effort option which may just be fixing the local_scope, fixing the line magic to accept the correct arguments (if we haven’t already fixed it?)

Read more comments on GitHub >

github_iconTop Results From Across the Web

Kedro's command line interface - Read the Docs
The following command deletes all the files related to a modular pipeline in your Kedro project. kedro pipeline delete <pipeline_name> Copy to clipboard....
Read more >
Frequently asked questions — Kedro 0.18.4 documentation
You can remove long delays created because you have to refactor a data science proof of concept into production. You don't need to...
Read more >
Kedro's command line interface — Kedro 0.17.6 documentation
The following command deletes all the files related to a modular pipeline in your Kedro project. kedro pipeline delete <pipeline_name> Copy to clipboard....
Read more >
Kedro and Jupyter Notebooks - Read the Docs
You can use %reload_kedro line magic within your Notebook to reload the Kedro variables (for example, if you need to update catalog following...
Read more >
Configuration — Kedro 0.18.4 documentation - Read the Docs
Kedro Logo. stable. Introduction ... %run_viz line magic · Convert functions from Jupyter Notebooks into Kedro nodes ... Create the SageMaker entry point....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found