Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Technical design doc for `KedroSession`

See original GitHub issue

The `KedroSession` ✨

The KedroSession is the object responsible for managing the lifecycle of a Kedro run. It has two main functions:

Run execution: It makes sure that all core components needed by Kedro to execute a run are instantiated and the run is executed properly
Persisting run data: KedroSession offers a way to persist run data through the session store. The following data gets saved in the session store:

package_name
project_path
session_id
CLI info: command run, run parameters
Git info: git sha, git branch, is branch dirty or not

Usage within Kedro 🏗

The KedroSession is a relatively new component within Kedro and at the time of writing, is mainly used to manage run lifecycles and for experiment tracking. The experiment tracking feature makes use of a session store implementation called the SQLiteStore, which uses SQLite to persist data. Other implementations of the session store available in Kedro are:

BaseSessionStore: the base class for all session stores that doesn’t persist any data
ShelveStore: implementation that uses the shelve package to persist data

Relation of a `run` and a `session` 🧑‍🤝‍🧑

While working on https://github.com/kedro-org/kedro/issues/1273 it was decided that Kedro session and Kedro run have a 1-1 mapping. This means that when a session gets created it will only ever be possible to kick off one full pipeline run during that specific session’s existence. In practice, Kedro manages this for you under the hood when kedro run is executed.

FAQ ❓

How does a Kedro user use KedroSession? As a Kedro user you don’t need to access the session directly. When you execute the kedro run command, a new session gets created automatically. This session will then kick off the pipeline run and when that process finishes, the session will be closed again persisting any run data if the project is configured with a persistent session store.

What about using KedroSession in an interactive workflow? When using jupyter or ipython you can access the active session object or create a new one. You can then retrieve the session_id, the run data that will be stored, load the context, and execute a run. However, we do not encourage users to use the session other than for checking the session_id and run data.

Related Github issues and PRs:

Issue Analytics

State:
Created 2 years ago
Reactions:3
Comments:6 (6 by maintainers)

Top GitHub Comments

2reactions

lorenabalancommented, Mar 10, 2022

@AntonyMilneQB I think not being able to run anything in the jupyter notebook / ipython takes away a lot from jupyter users we’re trying to convert to Python and Kedro. If we do that we need to seriously consider the consequences and clearly draw the boundaries of our target audience, because it sounds like they would be very different.

2reactions

noklamcommented, Mar 9, 2022

@AntonyMilneQB For me, it’s the ability to do checkpoint debugging in an interactive environment that matters. It may be I am not doing it in a right way, but I am interested in how others are using the Kedro Ipython/notebook other than EDA.

Just to recap, this is the workflow that I adopted in the past for development.

Run a partial pipeline and stop at the point of interest.
Do whatever I needed in a notebook environment. i.e. Changing the definition of a node / injecting / overwriting some of the data in catalog.
Continue to run the pipeline until I get my desired output.