Create a MlflowMetricDataSet
See original GitHub issueContext
As of today, kedro-mlflow
offers a clear way to log parameters (through a Hook
) and artifacts (through the MlflowArtifactDataSet
class in the catalog.yml
.
However, there is no weel-defined way to log metrics automatically in mlflow within the plugin. The user still have to log the metrics directly by calling log_metric
within its self-defined function. this is not very convenient nor parametrizable, and makes the code lesss portable and messier.
Feature description
Provide a unique and weel defined way to og metrics through the plugin.
Possible Implementation
The easiest implemation would be to create a MlflowMetricDataSet
very similar to MlflowArtifactDataSet
to enable logging the metric directly inthe catalog.yml
.
The main problem of this approach is that some metrics evolve over time, and we would like to log the metric on each update. This is not possible with this approach because the updates are made inside the node (when it is running), and not at the end.
Issue Analytics
- State:
- Created 3 years ago
- Comments:9 (5 by maintainers)
PR: https://github.com/Galileo-Galilei/kedro-mlflow/pull/49
@kaemo @Galileo-Galilei I have also idea, let me know what do you think about it. If you would find a time, I would be happy to have a live session (chat/video chat/another live channel of communication), where we could discuss this topic.
@kaemo,
I forgot to write it but using the most recent runs for loading is completely out of the possible solutions. Indeed I’ve learnt that some teams use a common mlflow for all data scientists (unlike my team where all data scientists have their own they can handle as they want+ a shared one for sharing models where training is triggered by CI/CD). This leads to conflicting writing issues (several runs can be launched by different data scientists at the same time. I feel that it is a very bad decision (and they complain that their mlflow is a total mess), but it is still what they use right now and we cannot exclude the possibility that even for my team the shared mlflow can have conflicts if several runs are launched concurrently (especially when models are long to train, e.g. deep learning models)