Experiment Tracking in Kedro
See original GitHub issueWhy should we care about Experiment Tracking?
Experiment tracking is a way to record all information that you would need to recreate a data science experiment. We think of it as logging for parameters, metrics, models and other artefacts.
Kedro currently has parts of this functionality. For example, it’s possible to log parameters as part of your codebase and snapshot models and other artefacts like plots with Kedro’s versioning capabilities for datasets. However, Kedro is missing a way to log metrics and capture all this logged metadata as a timestamped run of an experiment. It is also missing a way for users to visualise, discover and compare this logged metadata.
This change is essential to us because we want to standardise how logging for ML is done. There should be one easy way to capture this information, and we’re going to give users the Kedro way to do this.
This functionality is also expected to increase Kedro Lab usage by Data Scientists as it has anecdotally been known that people performing the Data Engineering workflow get the most benefits from Kedro-Viz while the Data Science workflow is not accounted for.
What evidence do we have to suggest that we do this?
Our users sense the gap, and one of the most common usage patterns of Kedro is with MLFlow Tracking, which provides this additional functionality. We have seen evidence here:
- Conversations with us here and here
- Development work:
- Galileo-Galelo’s popular
kedro-mlflow
plugin - The MLFlow integration in the
kedro-kubeflow
plugin - Kedro Starters using Scikit-learn and MLflow
- Model management and tracking (Internal)
- Galileo-Galelo’s popular
- Articles:
- Presentations and conference talks:
We also know that our internal users relied on PerformanceAI for this functionality. We sunset PerformanceAI, but PerformanceAI was fantastic to use because:
- It allowed multiple collaborators to share results
- It integrated nicely with Kedro
- The UI was great
Our vertical teams, namely C1 (@deepyaman), InsureX (@imdoroshenko @benhorsburgh) and OptimusAI (@mkretsch327) consider this high priority and will be confirmed users of this functionality.
What metrics will we track to prove the success of this feature?
kedro viz
terminal runs- A metric that points to the use of this feature
- Full adoption of the feature by all vertical teams
What design requirements do we have?
We must allow users to:
- Keep track of their metrics
- See the concept of an experiment on Kedro Lab
We must think about:
- Minimising the number of changes a user would need to make to activate this project from a current Kedro project
- How users would share their experiment results with other team members
- How this functionality would work with other MLFlow tools (e.g. model serving)
- How users would disable runs so that they don’t clutter run history
- How this functionality works with the
KedroSession
Issue Analytics
- State:
- Created 2 years ago
- Reactions:2
- Comments:9 (9 by maintainers)
Some thoughts after today’s tech design session.
The general statement above gives me an impression that Kedro is offering some “MLOps” capabilities.
I tried to group the experiment tracking features into 2 different categories:
I think the main focus of this GH issue is on point 1 and I see a lot of consideration with
mlflow
, but I arguemlflow
isn’t the best reference for this space. There are many more offered by tools likewandb
,neptune
, orclearml
. This article concluded them quite well as Dashboard as Operating SystemSo my question is, how much do we expect Kedro plays in this space and how far do we want to go? Or what are the things that we are not going to do for experiment tracking ? (Like kedro is not going to do any orchestration work) @yetudada @NeroOkwa CC: @AntonyMilneQB
(Comment copied over, originally written by @limdauto)
@AntonyMilneQB thanks for the amazing comments as always!
Re: General Concept
100% agree that we don’t need experiment as an abstraction. I wrote “we can” but I also don’t think “we should” do it. I’d be interested to see if any user has any legitimate use case after trying our workflow. It’s nice to have an escape hatch in the design.
Re: Milestone 1
How to mark which datasets are tracked on kedro viz
Yea actually this is a great point. Let me bring it up with @GabrielComymQB tomorrow. We can do something similar to the parameters.
Metrics Plot
Re: Milestone 2
Session Type
I think I’m specifically discussing data type here when we represent the session in the viz database. For experimentation tracking purpose, we only care about
run
vsnon-run
session, so I’m thinking to just set other session tonull
for now, including CLI session. For CLI, I don’t know how granular we want to be, e.g. do we want to splitcli
andjupyter
even though we launchjupyter
through the CLI?Scalability of querying by metrics
This touches on a design iteration that I haven’t mentioned. If we want to query by metrics, we need a metrics-friendly search index. At the very least, we need to setup an index in sqlite to do it: https://www.tutorialspoint.com/sqlite/sqlite_indexes.htm – but there are other solution, including an in-memory search index where we pay the cost up front when starting viz or we can even us full-blown disk-based search index too: https://whoosh.readthedocs.io/en/latest/index.html. There are pros & cons for each approach. I will write a separate design doc just for the metrics query. But it will be for later iteration.
Scalability of many runs
Since this was still being (visually) designed when I wrote the tech design, I didn’t put it in. But I absolutely agree with you that the ability to find runs in a long list is essential. In the first iteration, from a product point of view, our solution is:
accuracy>=0.8
In terms of technical performance, I’m still considering the pros and cons of whether to perform the search client-side or server-side. But I know for a fact we can do text search client side up to thousands of rows easily. For millions of rows, you can employ an embedded in-memory search index like this one to help: https://github.com/techfort/LokiJS. I’m still debating though.