question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

New platform-dependent `%launch_viz` line magic

See original GitHub issue

Ideal outcome:

  • make new %launch_viz line magic in kedro-viz, so that in any notebook with kedro IPython extension loaded, %launch_viz will be available
  • %launch_viz starts a kedro-viz server and supplies a URL which the user can click on to open kedro-viz in another browser window

Required steps:

  • Make sure the current code is still the best way to launch the server from inside the line magic
  • Potentially difficult. Work out the correct URL to access the kedro-viz instance on various platforms (databricks, sagemaker, etc.).
  • Potentially difficult. Work out how to programmatically obtain this URL
  • Work out how to automatically figure out which platform the notebook is running on
  • Output the correct URL or some useful message which might help a user find their kedro-viz instance if we can’t figure out the URL ourselves

To consider:

  • how/which arguments can the user pass? Ideally all the same flags as used when you do kedro viz would be available the same way
  • how to kill the server? See note in https://github.com/kedro-org/kedro/pull/1355 for current bug with %run_viz where this doesn’t happen

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

2reactions
AntonyMilneQBcommented, Aug 12, 2022

How to efficiently develop with Kedro-Viz on Databricks

After much trial and error, I have come up with a much more streamlined way to iterate on code being developed for Databricks. This should help to make the development loop much faster since there’s no need to restart the cluster or manually handle repos this way 🎉

  1. Make a branch for your work
  2. Run make build, git add -f package/kedro_viz/html and push to GitHub. This is temporarily needed while developing on your branch so that you can pip install from GitHub but should not remain there when you merge to main
  3. On Databricks, make sure that kedro-viz and kedro are not installed as cluster libraries.
  4. In your Databricks notebook, run (fill out NAME-OF-BRANCH):
%pip uninstall -y kedro-viz
%pip install git+https://github.com/kedro-org/kedro-viz.git@NAME-OF-BRANCH#subdirectory=package

Warning. Remember there’s quite a bit of confusion around differently-scoped pip installed packages. See https://github.com/kedro-org/kedro-viz/issues/831. In short, use %pip (not %sh pip) if you want to install notebook-scoped and ensure that kedro and kedro-viz are installed with the same scope (cluster or notebook).

  1. To make a test project if one doesn’t already exist:
%sh test -d iris || yes "" | kedro new -s pandas-iris
  1. Then load up the Kedro IPython extension, make sure you’re pointing to the right project path and do as you please:
%load_ext kedro.extras.extensions.ipython 
%reload_kedro iris
%run_viz
  1. Whenever you make changes to your branch, all you need to do is push to GitHub and then re-run your notebook. This will pip install the latest changes to the branch directly from GitHub. No need to restart the cluster or clone repos any more.
  2. Make sure you remove the package/kedro_viz/html folder before merging to main.

Note. It seems like using the Databricks repos feature would be a smoother development process, but it’s not. Every time you make a change to your branch you would need to pull the repo and reinstall on cluster, which means restarting the cluster every time (=slow). So don’t try doing it that way…

1reaction
AntonyMilneQBcommented, Jul 21, 2022

Let’s assume there will be two different ways that %launch_viz would work:

  1. Databricks: use the above. So far tested on Azure; still need to test on AWS and GCP.
  2. Jupyter servers: use jupyter-server-proxy. So far tested locally without JupyterHub; still need to test on Sagemaker and JupyterHub. Also what about Binder?

Next steps:

  1. above next steps for databricks method
  2. look at jupyter dash. They might have figured out all the jupyter proxy stuff already…
  3. make similar %launch_viz that starts process and links to it (find out if there’s a programmatic way to get the URL through jupyter-server-proxy)
  4. something that detects your platform and switches between the above two methods; starts the process and outputs something useful (“Kedro-Viz started on port X”) even if it can’t work out what the platform is
  5. get %launch_viz to take arguments for --pipeline etc.
  6. make the launcher button - only works for jupyter-server-proxy (probably can’t take arguments; would only be kedro viz --autoreload; need to think about how to get project path there)
  7. work out whether we need a way to kill the kedro-viz process
Read more comments on GitHub >

github_iconTop Results From Across the Web

No results found

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found