Remove kedro.line_magic entrypoint
See original GitHub issueThis issue has changed quite a bit since it was originally written. Look at the final post here to see what we now need to do.
Original issue here for posterity.
Description
Is your feature request related to a problem? A clear and concise description of what the problem is: “I’m always frustrated when …”
I am the author of several kedro plugins either open souce (kedro-mlflow) or at work. It turns out that most plugins either perform one (or several) of the following tasks:
- provide CLI commands
- modify pipeline/catalog/node behaviour with hooks
- add new datasets
Since Kedro is a data science framework, and since Jupyter is a standard for the exploration phase of a data science project, it seems natural to make Kedro compatible with Jupyter. For now, Kedro offers the following functionnality:
- setup a bunch of Kedro configuration (bootstraping the project, creating a session and activate it)
- declare some global variables to be accessible of the shell (catalog, session, context)
- register custom “line_magic” from plugins
This works well for the core library, but it would be a more pleasant user experience to improve plugin compatibilities with notebooks with the following actions:
- add documentation on how to create a custom line magic in your plugin, because this feature is undocumented yet. I had to read the code to discover it and I struggled a little to make it work². I can make a PR, maybe by adding a sub-section in the plugin section?
- add the possibility to execute some functions / push global variables to the notebook session when it is openened via a Kedro CLI command
Context
Why is this change important to you? How would you use it? How can it benefit other users?
It is a common pattern to “setup” your plugin, e.g. instantiate a connection to an external service (kedro-mlflow creates a connection to the mlflow server, I do have a sas plugin which creates a connection to the database…), especially in ther “before pipeline_run” hook.
If I create a custom line_magic
(say %reload_kedro_mlflow
), it is discovered by kedro and it is accessible inside the notebook or the ipython session. However, I’d like to run it automatically when the notebook is opened via the Kedro CLI instead of forcing the user to make it manually.
Possible Implementation
(Optional) Suggest an idea for implementing the addition or change.
- Add a
IpythonHook()
with aon_ipython_start
@hook_spec
to enable plugins to setup their own configuration when activating the session. - After the
line_magic
are discovered (https://github.com/quantumblacklabs/kedro/blob/fc9d1c5d35981c51a651af467e79e93df065d354/kedro/extras/extensions/ipython.py#L104-L106), callhook_manager.on_ipython_start(path, metadata, session, context, catalog)
, so that plugins can setup some configuration automatically.
Possible Alternatives
(Optional) Describe any alternative solutions or features you’ve considered.
It is currently possible to bypass Kedro configuration by creating a load_ipython_extension
function to execute code when opening the notebook (like here: https://github.com/quantumblacklabs/kedro/blob/fc9d1c5d35981c51a651af467e79e93df065d354/kedro/extras/extensions/ipython.py#L121). However, we cannot access the objects created by reload_kedro
(especially the session or the catalog if we want to modify /use them). We need to duplicate almost all the code to make it works, and it prevents combining several plugins.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:10 (9 by maintainers)
Notes from technical design session on 31 August:
needs_local_scope
should be on kedro-viz side)%load_ext kedro_viz
is not as beginner-friendly as the existing workflow where%run_viz
is registered automatically%load_ext kedro_viz
itself loading kedro-viz e.g. how do you provide CLI arguments likepipeline
? Note there is a flow here that’s possible and consistent with%load_ext kedro
though, in which%load_ext kedro_viz
executes%run_viz
but you can also call%run_viz
yourself with arguments%run_viz
line magic existing, but @SajidAlamQB thinks it’s fine to have the extra%load_ext kedro_viz
step as wellipython
+catalog
/context
/etc. arguments) would remove the need for the%load_ext
step. @AntonyMilneQB (and @Galileo-Galilei from above discussion) think this is overkill for now and could always be added laterI am slightly in favor of loading
%run_viz
automatically.Existing workflows
kedro ipython/jupyter
->%reload_kedro
(If you are in local)%load_ext kedro
->%reload_kedro
(If you are in JupyterHub/Databricks)I would expect 1 is a more common case, so I think most people would only interact with
%reload_kedro
, and therefore they should interact with%run_viz
equivalently. We could also add more line magic in the future (maybe a line magic to save a static output or something related to experiment tracking?), this makes me less prefer overloading theload_ext
too much.Overall I agree the existing implementation is incorrect (arguably it’s a bug, just that no one is using it), but I would go for a lower effort option which may just be fixing the
local_scope
, fixing the line magic to accept the correct arguments (if we haven’t already fixed it?)