Develop MVP of model bundle

See original GitHub issue

Is your feature request related to a problem? Please describe. Thanks for the interesting technical discussion with @ericspod @wyli @atbenmurray @rijobro , as we still have many unclear requirements and unknown use cases, we plan to develop the model package feature step by step, May adjust the design based on feedback during the development.

For the initial step, the core team aligned to develop a small but typical example for inference first, it will use JSON config files to define environments, components and workflow, save the config and model into TorchScript model. then other projects can easily reconstruct the exact same python program and parameters to reproduce the inference. When the small MVP is ready, will share and discuss within the team for the next steps.

I will try to implement the MVP referring to some existing solutions, like NVIDIA Clara MMAR, ignite online package, etc. Basic task steps:

Include metadata (env / sys info, changelog, version, input / output data format, etc), configs, model weights, etc. in a model package example for review. PR: https://github.com/Project-MONAI/tutorials/pull/487
Search specified python packages and build python instances from dictionary config with name / path & args. PR: https://github.com/Project-MONAI/MONAI/pull/3720
Recursively parse configs in a dictionary with dependencies and executable code, for example: {"dataset": {"<name>": "Dataset", "<args>": {"data": "$load_datalist()"}}, "dataloader": {"<name>": "DataLoader", "<args>": {"data": "@dataset"}}}. PR: https://github.com/Project-MONAI/MONAI/pull/3818, https://github.com/Project-MONAI/MONAI/pull/3822
Add support to save the raw config dictionaries into TorchScript model. PR: Project-MONAI/MONAI#3138 .
Add schema mechanism to verify the content of folder structure, metadata.json content, etc. Refer to: https://json-schema.org/, https://github.com/Julian/jsonschema. PR: https://github.com/Project-MONAI/MONAI/pull/3865
Add support to verify network input / output with fake data (data input is from metadata).
Add mechanism to easily load JSON config files with override (can add YAML support in future), similar to example: https://hydra.cc/docs/tutorials/basic/your_first_app/config_groups/. PR: https://github.com/Project-MONAI/MONAI/pull/3832
Add support to refer to other config item in the same config file or other confg files, referring to Hydra ideas: https://hydra.cc/docs/advanced/overriding_packages/.
Complete the inference example MMAR for spleen segmentation task. PR: https://github.com/Project-MONAI/tutorials/pull/604
Write user manual and detailed documentation. PR: https://github.com/Project-MONAI/MONAI/pull/3834 https://github.com/Project-MONAI/MONAI/pull/3982
[*Optional] Upload the spleen MMAR example to huggingface(https://github.com/Project-MONAI/MONAI/discussions/3451).
[*Optional] Support relative config level in the reference ID, for example: "test": "@###data#1", 1 # means current level, 2 ## means upper level, etc. PR: https://github.com/Project-MONAI/MONAI/pull/3974
[*Optional] Support customized ConfigItem and ReferenceResolver in the ConfigParser. PR: https://github.com/Project-MONAI/MONAI/pull/3980/
_requires_ keyword for config component (https://github.com/Project-MONAI/MONAI/issues/3942).
Import statement in bundle config (https://github.com/Project-MONAI/MONAI/issues/3966).
Config python logging properties in a file. PR: https://github.com/Project-MONAI/MONAI/pull/3994
[*Optional] Specify rank ID for component to run only on some ranks, for example, saving checkpoint in rank 0.

Issue Analytics

State:
Created 2 years ago
Reactions:3
Comments:38 (36 by maintainers)

Top GitHub Comments

4reactions

ericspodcommented, Jul 9, 2022

To recap where we are with existing issues/PRs:

Project-MONAI/MONAI#3138 is merged
https://github.com/Project-MONAI/tutorials/pull/487 package MVP is here to review

Related issues:

4reactions

ericspodcommented, Dec 14, 2021

We should discuss a clear definition of requirements and objectives. We want to define a format of a single file or multiples files which contains the model weights at least with secondary information describing how to use it for various use cases. This will allow a human or a program to determine what sort of model it is, how to use the model, and what tasks to use it for. For our MVP we want to consider the starting position of what the model weight storage and metadata storage would look like and if it would achieve that objective to some degree.

The base level set of requirements I would suggest are:

Ability to save model weights
Ability to save runnable model in Torchscript and/or ONNX
Include information on model name, version, description, authorship, intended use
Include package and technical information needed to use model, eg. MONAI version, Pytorch and other package versions
Description of input and output, eg. how to interpret predictions returned
Description in some form of how to transform input data into the form expected by the model (ie. preprocess transform sequence) and transform for output

One use case for this information is a human user looking into how the model is used in a particular task. They would want a clear idea of what inputs are expected and what the outputs mean. Whatever format this information is can be either easily read by a human or easily converted into a convenient format using included tools.

A second use case is a deployment environment which automatically constructs whatever infrastructure is needed around a model to present it through a standard interface. This would require generating transform sequences automatically to pre- and post-process data, and load the model through some script defining the workflow. This would used by MONAI Deploy to automatically generate a MAP from the package, or another script automatically create a Docker image to serve the model as a command line tool or through Flask, or another script to interface with existing hosting services to upload the model and whatever other information is needed.