Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unable to set InfraValidator model serving container resources

See original GitHub issue

We want to set container resources on the model serving container that InfraValidator creates to run the model in a ‘sandbox’. We thought it necessary to ensure that the container uses the same resources that will be available when the model is served on a production serving instance.

We mistakenly thought component_config_overrides would be the mechanism, but of course that only applies to the TFX component itself. We have now found the code that creates the pod/container spec and request K8s to create a pod here:

https://github.com/tensorflow/tfx/blob/1db1358fcef2d0f5571e0ca68079f1401d7fb1ef/tfx/components/infra_validator/model_server_runners/kubernetes_runner.py#L236

It is clear that this is uses whatever default resources K8s chooses to assign when creating the pod. We cannot see any other configuration applied anywhere that might allow us to override this.

Maybe we misunderstand the intention or operation of InfraValidator? Or maybe this should be a feature request?

For info, this discussion started over on KF Github here: https://github.com/kubeflow/pipelines/issues/4822

I would be willing to become a contributor and implement/test this feature. Reading the InfraValidator component code, my first thought is to add a new element to the Executor exec_properties which adheres to the K8s resource spec:

https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/V1ResourceRequirements.md

This would be passed as an Optional dict argument all the way down the call chain to KubernetesRunner. There the dict would be validated according to:

https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/

and if ok, used to set the resources property of the V1Container:

https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/V1Container.md

here: https://github.com/tensorflow/tfx/blob/1db1358fcef2d0f5571e0ca68079f1401d7fb1ef/tfx/components/infra_validator/model_server_runners/kubernetes_runner.py#L243

Issue Analytics

State:
Created 3 years ago
Reactions:2
Comments:8 (5 by maintainers)

Top GitHub Comments

1reaction

chongkongcommented, Jan 4, 2021

Hi, I’m the author of the InfraValidator component. We didn’t have configuration for specifying Pod/Container spec yet because we didn’t have a customer use cases thus lacked the good understanding on which config shall be included or not. I’m glad to help your work on this, whether it is a discussion thread on here or from a PR. For adding configuration, we usually put configs on proto files (see tfx/proto/infra_validator.proto) and that might be a good starting point.

And @ConverJens sorry for not following up later from #1871, and I’m available to start working on this. Let’s continue discussion on that particular thread.

0reactions

chongkongcommented, Jan 5, 2021

It is true that TFX lacks a good support for local development for external developers…😅 Assuming we have no other errors, ideally pip install would automatically run all the required bazel build steps (for TFX this is only for populating proto stub files). I’m not sure it’s currently working, but you can install the local repository in editable mode (pip install -e .). After that you can start developing and testing.

You also don’t have to run all integration tests, and in fact I think we don’t have an integration test for InfraValidator for Kubernetes, and manual test is required. When we’re reviewing your PR, we would do the same manual testing as well.

Top Results From Across the Web

The InfraValidator TFX Pipeline Component - TensorFlow

InfraValidator is using the same resources (e.g. allocation quantity and type of ... If InfraValidator fails, the model will not be pushed.

Pre-built containers for prediction and explanation | Vertex AI

Vertex AI provides Docker container images that you run as pre-built containers for serving predictions and explanations from trained model artifacts.

ML Model in Production: Real-world example of End-to-End ...

The following diagram shows how each step of the TFX ML pipeline runs using a managed service on Google Cloud, which ensures agility, ......

Create a Model Package Resource - Amazon SageMaker

Model artifacts can either be packaged in the same Docker container as the inference code or stored in Amazon S3. The instance types...

Get Started with TensorFlow Transform | TFX

Define a preprocessing function, a logical description of the pipeline that transforms the raw data into the data used to train a machine...