Develop MVP of model bundle
See original GitHub issueIs your feature request related to a problem? Please describe. Thanks for the interesting technical discussion with @ericspod @wyli @atbenmurray @rijobro , as we still have many unclear requirements and unknown use cases, we plan to develop the model package feature step by step, May adjust the design based on feedback during the development.
For the initial step, the core team aligned to develop a small but typical example for inference
first, it will use JSON config files to define environments, components and workflow, save the config and model into TorchScript model. then other projects can easily reconstruct the exact same python program and parameters to reproduce the inference. When the small MVP is ready, will share and discuss within the team for the next steps.
I will try to implement the MVP referring to some existing solutions, like NVIDIA Clara MMAR, ignite online package, etc. Basic task steps:
- Include metadata (env / sys info, changelog, version, input / output data format, etc), configs, model weights, etc. in a model package example for review. PR: https://github.com/Project-MONAI/tutorials/pull/487
- Search specified python packages and build python instances from dictionary config with
name
/path
&args
. PR: https://github.com/Project-MONAI/MONAI/pull/3720 - Recursively parse configs in a dictionary with dependencies and executable code, for example:
{"dataset": {"<name>": "Dataset", "<args>": {"data": "$load_datalist()"}}, "dataloader": {"<name>": "DataLoader", "<args>": {"data": "@dataset"}}}
. PR: https://github.com/Project-MONAI/MONAI/pull/3818, https://github.com/Project-MONAI/MONAI/pull/3822 - Add support to save the raw config dictionaries into TorchScript model. PR: Project-MONAI/MONAI#3138 .
- Add schema mechanism to verify the content of folder structure, metadata.json content, etc. Refer to: https://json-schema.org/, https://github.com/Julian/jsonschema. PR: https://github.com/Project-MONAI/MONAI/pull/3865
- Add support to verify network input / output with fake data (data input is from
metadata
). - Add mechanism to easily load JSON config files with override (can add YAML support in future), similar to example: https://hydra.cc/docs/tutorials/basic/your_first_app/config_groups/. PR: https://github.com/Project-MONAI/MONAI/pull/3832
- Add support to refer to other config item in the same config file or other confg files, referring to Hydra ideas: https://hydra.cc/docs/advanced/overriding_packages/.
- Complete the inference example MMAR for spleen segmentation task. PR: https://github.com/Project-MONAI/tutorials/pull/604
- Write user manual and detailed documentation. PR: https://github.com/Project-MONAI/MONAI/pull/3834 https://github.com/Project-MONAI/MONAI/pull/3982
- [*Optional] Upload the spleen MMAR example to
huggingface
(https://github.com/Project-MONAI/MONAI/discussions/3451). - [*Optional] Support relative config level in the reference ID, for example:
"test": "@###data#1"
, 1#
means current level, 2##
means upper level, etc. PR: https://github.com/Project-MONAI/MONAI/pull/3974 - [*Optional] Support customized
ConfigItem
andReferenceResolver
in theConfigParser
. PR: https://github.com/Project-MONAI/MONAI/pull/3980/ -
_requires_
keyword for config component (https://github.com/Project-MONAI/MONAI/issues/3942). - Import statement in bundle config (https://github.com/Project-MONAI/MONAI/issues/3966).
- Config python logging properties in a file. PR: https://github.com/Project-MONAI/MONAI/pull/3994
- [*Optional] Specify rank ID for component to run only on some ranks, for example, saving checkpoint in rank 0.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:3
- Comments:38 (36 by maintainers)
Top GitHub Comments
To recap where we are with existing issues/PRs:
Related issues:
We should discuss a clear definition of requirements and objectives. We want to define a format of a single file or multiples files which contains the model weights at least with secondary information describing how to use it for various use cases. This will allow a human or a program to determine what sort of model it is, how to use the model, and what tasks to use it for. For our MVP we want to consider the starting position of what the model weight storage and metadata storage would look like and if it would achieve that objective to some degree.
The base level set of requirements I would suggest are:
One use case for this information is a human user looking into how the model is used in a particular task. They would want a clear idea of what inputs are expected and what the outputs mean. Whatever format this information is can be either easily read by a human or easily converted into a convenient format using included tools.
A second use case is a deployment environment which automatically constructs whatever infrastructure is needed around a model to present it through a standard interface. This would require generating transform sequences automatically to pre- and post-process data, and load the model through some script defining the workflow. This would used by MONAI Deploy to automatically generate a MAP from the package, or another script automatically create a Docker image to serve the model as a command line tool or through Flask, or another script to interface with existing hosting services to upload the model and whatever other information is needed.