Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Documentation: Loading pipelines for inference

See original GitHub issue

Reading about saving and loading, I find it hard to understand how to save and load a model in order to use it for inference. In particular, it’s not clear to me how the setup() phase relates to saving and loading.

This page gives an overview of the lifecycle of a model.
Here it is implied that ‘setup()’ occurs after load. However, it does not seem like the setup method is being called anywhere, except when fitting a model.

It would be nice with documentation on how the lifecycle works for inference. For example:

am i supposed to call setup() manually, after load? If so,
- how do i recursively setup a pipeline without implementing it myself?
- is setup supposed to be run on a loaded state? this requires care, as one could easily overwrite lodaded state in setup. this should be documented.
am i supposed to run setup() before load?
- this seems unlikely looking at the flowchart
- does not work with the ‘load()’ method in ExecutionContext which returns a new instance
am i supposed to not run setup() before inference?
- im suspecting this is the idea
- it would be nice with documentation on this, with an example. It has implications on how both saving, loading and setup needs to be written.
- i cant make sense of the flowchart if this is the case.

Issue Analytics

State:
Created 2 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

vincent-antakicommented, May 6, 2021

Hello Joel!

Here is some useful information with regards to your question:

You usually shouldn’t have to to call setup manually, although you can if want to.
You are not supposed to run setup before load.
It is expected that, on the first call of setup, the attribute self.is_initialized (which is False after the constructor call) is set to True. If you want to avoid overwriting behaviour on a setup call, all you need to do is add a condition over self.is_initialized before the sensitive block of code.
All steps that have childrens (MetaSteps and TruncableSteps) are supposed to recursively call setup on their children, with the notable exception of Pipeline instances.
Pipeline instances are a special case of setup, where the call to setup its children is delayed right before the fit call is performed on its children. This is important for steps that can’t be setup right away (e.g. some intermediate value needs to be computed in a previous setup method is called).
In case Pipeline is not the root of your ML pipeline, then calling setup manually (or adding a call to it in your fit function) is expected. Regardless, setup will be called on fit when a Pipeline instance is reached.
In the current setup of saving and loading, all steps are supposed to have setup called before saving. In case self.is_initialized is False (i.e. the setup method has not been called) for a given step of the pipeline at the moment of saving, a call to the setup method will be forced. Note that this may change at some point, as we intend that self.is_initialized==true will no longer be required to save a step thus allowing save even if the pipeline hasn’t had a fit call yet (see #470).

side note : I think the flowchart may be getting old a bit.

Specifically with regards to your 3 options, the third one is the intended usage. It is expected that setup is only called through pipeline fit calls. From there, here are a couple of options you have :

You could write a custom Saver (see BaseSaver) for your step that needs setupping after load.
You could add a call to setup in your transform function.
You could use a apply call (e.g. pipeline.apply(‘setup’, context=context)) to force call setup on all on every steps in the pipeline. I’d recommend avoiding this option though as a step’s setup function might be called multiple time (once through the apply, and multiple other time through its parents apply calls).

Overall, I agree with you that setup is poorly documented and might need to be revisited eventually.

Feel free to ask more questions if you have any, I’ll be glad to help you. Cheers!

0reactions

stale[bot]commented, Nov 16, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs in the next 180 days. Thank you for your contributions.

Top Results From Across the Web

Pipelines for inference - Hugging Face

The pipeline() makes it simple to use any model from the Hub for inference on any language, computer vision, speech, and multimodal tasks....

Host models along with pre-processing logic as serial ...

An inference pipeline is a Amazon SageMaker model that is composed of a linear sequence of two to fifteen containers that process requests...

Inference Pipeline - OpenVINO™ Documentation

Inference Pipeline ¶ · Create a Core object. 1.1. (Optional) Load extensions · Read a model from a drive. 2.1. (Optional) Perform model...

Inference Pipeline with Scikit-learn and Linear Learner

Typically a Machine Learning (ML) process consists of few steps: data gathering with various ETL jobs, pre-processing the data, featurizing the dataset by ......

open3d.ml.tf.pipelines.SemanticSegmentation

This pipeline has multiple stages: Pre-processing, loading dataset, testing, and inference or training. Example: This example loads the Semantic ...