Develop experiment management module
See original GitHub issueIs your feature request related to a problem? Please describe. To record and track the training experiments clearly, experiment management is a necessary module.
- Identify the typical user stories
- Identify the features we should support
- ~Design the module and APIs, which can easily support different backends, like MLFlow, AIM, etc.~
- Try to apply
MLFlow
in the Auto3DSeg application.
Issue Analytics
- State:
- Created a year ago
- Comments:5 (5 by maintainers)
Top Results From Across the Web
15 Best Tools for ML Experiment Tracking and Management
Neptune Weights & Biases Comet Sacred & Omniboard MLflow
Web UI or console‑based? Web UI Web UI Web UI Web UI Web UI
– Dataset...
Read more >Experiment Management: How to Organize Your Model ...
Experiment Management : How to Organize Your Model Development Process · Code version control for data science · Tracking hyperparameters · Data versioning....
Read more >Managing Experimentation: Module Overview Note for ...
Describes the conceptual foundations and pedagogy for a module on managing experimentation in the development of products and services. The module has been ......
Read more >Experiment Manager - JOAN documentation
Using the JOAN core modules »; Experiment Manager. Module: Experiment Manager ... create or modify an experiment,; add conditions to the experiment, ...
Read more >10 tips for machine learning experiment tracking and ...
No matter whether you do it yourself or use an experiment management platform, just do it! Build ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @Nic-Ma @binliunls @dongyang0122 @ericspod @wyli
Here are some thoughts of mine about MLFlow for Auto3DSeg, from two perspectives: user experience and implementation for the release in MONAI v1.1. Thanks!
train_local
andtracking_url
. If user wants to run all trainings locally,train_local
should be True andtracking_url
should be set to ‘localhost’. Then MLFlow server will start immediately after the BundleGen/AlgoGen locally. If it is meant to be local, then it will print a message for the user to start the service remotely. It is the user’s job to run the server on a remote machine.algo.train()
to start trainings with experiment management ON or OFF.enable_mlflow
,tracking_url
,experiment_name
,params
,metrics
and so on. Optionally, they can usealgo._create_cmd()
to see the command to run. Below are some drafts of MLFlow related arguments for the training to take:enable_mlflow
: use mlflow as backendtracking_url
: use localhost or remote ip address for the mlflow serverexperiment_name
, required by mlflowparams
: a set of keys to log in training (before the iterations)metrics
: a set of keys to log in training (during the iterations)ExperimentManager
withMLFlowExperimentManager
as the only subclass in MONAI 1.1 .MLFlowExperimentManager
can initiate the server locally and records where it keeps the database. It can print a helper message if the server will start remotely. (Local server use SQLite as backend?)MLFlowExperimentManager
managesexperiment_name
andrun_name
MLFlowExperimentManager
manages a list ofparams
names to log. Aboutlog_params
in mlflow:MLFlowExperimentManager
manages another list ofmetrics
names to log. Aboutlog_metrics
in mlflow:max_epochs
in the param buffer, the variable in thetrain.py
has to bemax_epochs
. It can’t betotal_epochs
ornum_epochs
.params
andmetrics
during the running oftrain.py
. If the key is the name of a variable value, then it will trigger themlflow.log_metrics
ormlflow.log_params
wrapped inside theMLFlowExperimentManager
.I’m starting to write bundles which choose new output directories every time the training script is invoked so that runs get placed in unique locations. I want to record the loggers to a log file in that directory but it would also be good to write the current configuration that bundle is using so that one can see what was changed one run to the next. This won’t include any auxillary code the bundle uses but it would be most of the way there of keeping track of what environment the run used that generated the data in that directory. This is also lighter weight than tools like mlflow and would suit environments this can’t be used in.