Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

New platform-dependent `%launch_viz` line magic

See original GitHub issue

Ideal outcome:

make new %launch_viz line magic in kedro-viz, so that in any notebook with kedro IPython extension loaded, %launch_viz will be available
%launch_viz starts a kedro-viz server and supplies a URL which the user can click on to open kedro-viz in another browser window

Required steps:

Make sure the current code is still the best way to launch the server from inside the line magic
Potentially difficult. Work out the correct URL to access the kedro-viz instance on various platforms (databricks, sagemaker, etc.).
Potentially difficult. Work out how to programmatically obtain this URL
Work out how to automatically figure out which platform the notebook is running on
Output the correct URL or some useful message which might help a user find their kedro-viz instance if we can’t figure out the URL ourselves

To consider:

how/which arguments can the user pass? Ideally all the same flags as used when you do kedro viz would be available the same way
how to kill the server? See note in https://github.com/kedro-org/kedro/pull/1355 for current bug with %run_viz where this doesn’t happen

Issue Analytics

State:
Created a year ago
Comments:6 (6 by maintainers)

Top GitHub Comments

2reactions

AntonyMilneQBcommented, Aug 12, 2022

How to efficiently develop with Kedro-Viz on Databricks

After much trial and error, I have come up with a much more streamlined way to iterate on code being developed for Databricks. This should help to make the development loop much faster since there’s no need to restart the cluster or manually handle repos this way 🎉

Make a branch for your work
Run make build, git add -f package/kedro_viz/html and push to GitHub. This is temporarily needed while developing on your branch so that you can pip install from GitHub but should not remain there when you merge to main
On Databricks, make sure that kedro-viz and kedro are not installed as cluster libraries.
In your Databricks notebook, run (fill out NAME-OF-BRANCH):

%pip uninstall -y kedro-viz
%pip install git+https://github.com/kedro-org/kedro-viz.git@NAME-OF-BRANCH#subdirectory=package

Warning. Remember there’s quite a bit of confusion around differently-scoped pip installed packages. See https://github.com/kedro-org/kedro-viz/issues/831. In short, use %pip (not %sh pip) if you want to install notebook-scoped and ensure that kedro and kedro-viz are installed with the same scope (cluster or notebook).

To make a test project if one doesn’t already exist:

%sh test -d iris || yes "" | kedro new -s pandas-iris

Then load up the Kedro IPython extension, make sure you’re pointing to the right project path and do as you please:

%load_ext kedro.extras.extensions.ipython 
%reload_kedro iris
%run_viz

Whenever you make changes to your branch, all you need to do is push to GitHub and then re-run your notebook. This will pip install the latest changes to the branch directly from GitHub. No need to restart the cluster or clone repos any more.
Make sure you remove the package/kedro_viz/html folder before merging to main.

Note. It seems like using the Databricks repos feature would be a smoother development process, but it’s not. Every time you make a change to your branch you would need to pull the repo and reinstall on cluster, which means restarting the cluster every time (=slow). So don’t try doing it that way…

1reaction

AntonyMilneQBcommented, Jul 21, 2022

Let’s assume there will be two different ways that %launch_viz would work:

Databricks: use the above. So far tested on Azure; still need to test on AWS and GCP.
Jupyter servers: use jupyter-server-proxy. So far tested locally without JupyterHub; still need to test on Sagemaker and JupyterHub. Also what about Binder?

Next steps:

above next steps for databricks method
look at jupyter dash. They might have figured out all the jupyter proxy stuff already…
make similar %launch_viz that starts process and links to it (find out if there’s a programmatic way to get the URL through jupyter-server-proxy)
something that detects your platform and switches between the above two methods; starts the process and outputs something useful (“Kedro-Viz started on port X”) even if it can’t work out what the platform is
get %launch_viz to take arguments for --pipeline etc.
make the launcher button - only works for jupyter-server-proxy (probably can’t take arguments; would only be kedro viz --autoreload; need to think about how to get project path there)
work out whether we need a way to kill the kedro-viz process