Invoke deployed endpoint on port 8080 with gRPC
See original GitHub issue/kind bug
I have deployed a model using tensorflow savedmodel and it is listening on both http port:8080 and grpc port:9000 by default, I tested http one with url and headers, in headers we have to write host name other wise it doesn’t work, it was working fine. Then I deployed same model with http listening on 9000 and grpc listening on 8080 with custom yaml file, it deployed successfully. But I dont know how to invoke endpoint with grpc, I have ip address and port which is 80 as it is mapped on 8080 port of our model. Another thing is that is it necessary to pass host name in grpc like it is in rest api to mention host in headers while invoking rest api.
Here is my yaml file
apiVersion: "serving.kubeflow.org/v1alpha2"
kind: "InferenceService"
metadata:
annotations:
name: "<model-name>"
spec:
default:
predictor:
serviceAccountName: sa
custom:
container:
name: "kfserving-container"
args:
- tensorflow_model_server
- --model_base_path=/mnt/models
- --model_name=<model>
- --port=8080
- --rest_api_port=9000
env:
- name: STORAGE_URI
value: "<s3 path>"
image: tensorflow/serving:latest-gpu
resources:
limits:
cpu: "1"
memory: 16Gi
requests:
cpu: "1"
memory: 16Gi
Here are the logs of my pod
usage: tensorflow_model_server
2020-11-25 07:15:05.248383: I tensorflow_serving/model_servers/server.cc:87] Building single TensorFlow model file config: model_name: <model-name> model_base_path: /mnt/models
Flags:
--port=8500 int32 Port to listen on for gRPC API
--grpc_socket_path="" string If non-empty, listen to a UNIX socket for gRPC API on the given path. Can be either relative or absolute path.
--rest_api_port=0 int32 Port to listen on for HTTP/REST API. If set to zero HTTP/REST API will not be exported. This port must be different than the one specified in --port.
--rest_api_num_threads=16 int32 Number of threads for HTTP/REST API processing. If not set, will be auto set based on number of CPUs.
--rest_api_timeout_in_ms=30000 int32 Timeout for HTTP/REST API calls.
--enable_batching=false bool enable batching
--allow_version_labels_for_unavailable_models=false bool If true, allows assigning unused version labels to models that are not available yet.
--batching_parameters_file="" string If non-empty, read an ascii BatchingParameters protobuf from the supplied file name and use the contained values instead of the defaults.
--model_config_file="" string If non-empty, read an ascii ModelServerConfig protobuf from the supplied file name, and serve the models in that file. This config file can be used to specify multiple models to serve and other advanced parameters including non-default version policy. (If used, --model_name, --model_base_path are ignored.)
--model_config_file_poll_wait_seconds=0 int32 Interval in seconds between each poll of the filesystemfor model_config_file. If unset or set to zero, poll will be done exactly once and not periodically. Setting this to negative is reserved for testing purposes only.
--model_name="default" string name of model (ignored if --model_config_file flag is set)
--model_base_path="" string path to export (ignored if --model_config_file flag is set, otherwise required)
--max_num_load_retries=5 int32 maximum number of times it retries loading a model after the first failure, before giving up. If set to 0, a load is attempted only once. Default: 5
--load_retry_interval_micros=60000000 int64 The interval, in microseconds, between each servable load retry. If set negative, it doesn't wait. Default: 1 minute
--file_system_poll_wait_seconds=1 int32 Interval in seconds between each poll of the filesystem for new model version. If set to zero poll will be exactly done once and not periodically. Setting this to negative value will disable polling entirely causing ModelServer to indefinitely wait for a new model at startup. Negative values are reserved for testing purposes only.
--flush_filesystem_caches=true bool If true (the default), filesystem caches will be flushed after the initial load of all servables, and after each subsequent individual servable reload (if the number of load threads is 1). This reduces memory consumption of the model server, at the potential cost of cache misses if model files are accessed after servables are loaded.
--tensorflow_session_parallelism=0 int64 Number of threads to use for running a Tensorflow session. Auto-configured by default.Note that this option is ignored if --platform_config_file is non-empty.
--tensorflow_intra_op_parallelism=0 int64 Number of threads to use to parallelize the executionof an individual op. Auto-configured by default.Note that this option is ignored if --platform_config_file is non-empty.
--tensorflow_inter_op_parallelism=0 int64 Controls the number of operators that can be executed simultaneously. Auto-configured by default.Note that this option is ignored if --platform_config_file is non-empty.
--ssl_config_file="" string If non-empty, read an ascii SSLConfig protobuf from the supplied file name and set up a secure gRPC channel
--platform_config_file="" string If non-empty, read an ascii PlatformConfigMap protobuf from the supplied file name, and use that platform config instead of the Tensorflow platform. (If used, --enable_batching is ignored.)
2020-11-25 07:15:05.248564: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
2020-11-25 07:15:05.248577: I tensorflow_serving/model_servers/server_core.cc:575] (Re-)adding model: tracking
2020-11-25 07:15:05.348793: I tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: tracking version: 1}
2020-11-25 07:15:05.348814: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: tracking version: 1}
2020-11-25 07:15:05.348822: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: tracking version: 1}
2020-11-25 07:15:05.348850: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /mnt/models/1
2020-11-25 07:15:05.357807: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2020-11-25 07:15:05.357836: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:234] Reading SavedModel debug info (if present) from: /mnt/models/1
2020-11-25 07:15:05.357919: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-11-25 07:15:05.359175: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-11-25 07:15:05.362778: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-25 07:15:05.363092: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:00:1e.0 name: Tesla T4 computeCapability: 7.5
coreClock: 1.59GHz coreCount: 40 deviceMemorySize: 14.75GiB deviceMemoryBandwidth: 298.08GiB/s
2020-11-25 07:15:05.363104: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2020-11-25 07:15:05.363147: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-25 07:15:05.363447: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-25 07:15:05.363720: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-11-25 07:15:06.234968: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-25 07:15:06.234994: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0
2020-11-25 07:15:06.234999: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N
2020-11-25 07:15:06.235103: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-25 07:15:06.235413: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-25 07:15:06.235701: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-25 07:15:06.236017: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 249 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:1e.0, compute capability: 7.5)
2020-11-25 07:15:06.253277: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:199] Restoring SavedModel bundle.
2020-11-25 07:15:06.253319: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:209] The specified SavedModel has no variables; no checkpoints were restored. File does not exist: /mnt/models/1/variables/variables.index
2020-11-25 07:15:06.253339: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:303] SavedModel load for tags { serve }; Status: success: OK. Took 904488 microseconds.
2020-11-25 07:15:06.253753: I tensorflow_serving/servables/tensorflow/saved_model_warmup_util.cc:59] No warmup data file found at /mnt/models/1/assets.extra/tf_serving_warmup_requests
2020-11-25 07:15:06.253878: I tensorflow_serving/core/loader_harness.cc:87] Successfully loaded servable version {name: <model-name> version: 1}
2020-11-25 07:15:06.255357: I tensorflow_serving/model_servers/server.cc:367] Running gRPC ModelServer at 0.0.0.0:8080 ...
[warn] getaddrinfo: address family for nodename not supported
[evhttp_server.cc : 238] NET_LOG: Entering the event loop ...
2020-11-25 07:15:06.256714: I tensorflow_serving/model_servers/server.cc:387] Exporting HTTP/REST API at:localhost:9000 ...
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:7 (2 by maintainers)
Top Results From Across the Web
Invoke deployed endpoint on port 8080 with gRPC · Issue #1229
I have deployed a model using tensorflow savedmodel and it is listening on both http port:8080 and grpc port:9000 by default, I tested...
Read more >Using gRPC | Cloud Run Documentation
Create a gRPC server to handle requests and return responses: it should listen to the PORT environment variable. Create a client that sends...
Read more >Implementing a gRPC Service - Quarkus
In the dev mode, you can try out your gRPC services in the Quarkus Dev UI. Just go to http://localhost:8080/q/dev and click on...
Read more >Grpc, AWS Lambdas and GoLang - Earthly Blog
GRPC Proxy on Lambda ... One thing its easy to do is setup a web proxy that runs on lambda and gets requests...
Read more >gRPCui: Don't gRPC Without It! - FullStory
It is a command-line tool (written in Go). When you invoke grpcui and point it at a gRPC server, it will dial that...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@daganida88 @akash-harijan Please check the tensorflow gRPC example
I am also using v1alpha2 and not v1beta1… it says inference service resource does not exists in v1beta1.