no response from model, Triton server example
See original GitHub issueIstio Version: v1.7.1 Knative Version: v0.19 Kubernetes version: v1.17
I have managed to make Bert example’s Triton deployment start functioning as seen in image with curl -v http://bert-v2-predictor-default.default.192.168.1.235.xip.io/v2
, but could not get any other response, either by /v2/models
or /v2/models/bert-v2
. Since inference service status is READY True
, I am assuming theres no problem with locating/loading models. What else I can do to further investigate?
Some status info;
$ curl -v http://bert-v2-predictor-default.default.192.168.1.235.xip.io/v2
* Trying 192.168.1.235...
* TCP_NODELAY set
* Connected to bert-v2-predictor-default.default.192.168.1.235.xip.io (192.168.1.235) port 80 (#0)
> GET /v2 HTTP/1.1
> Host: bert-v2-predictor-default.default.192.168.1.235.xip.io
> User-Agent: curl/7.58.0
> Accept: */*
>
< HTTP/1.1 200 OK
< content-length: 215
< content-type: application/json
< date: Fri, 08 Jan 2021 05:58:06 GMT
< server: istio-envoy
< x-envoy-upstream-service-time: 2
<
* Connection #0 to host bert-v2-predictor-default.default.192.168.1.235.xip.io left intact
{"name":"triton","version":"2.4.0","extensions":["classification","sequence","model_repository","schedule_policy","model_configuration","system_shared_memory","cuda_shared_memory","binary_tensor_data","statistics"]}
$ curl -v http://bert-v2-predictor-default.default.192.168.1.235.xip.io/v2/models/bert-v2
* Trying 192.168.1.235...
...
...
< HTTP/1.1 400 Bad Request
< content-length: 61
< content-type: application/json
< date: Fri, 08 Jan 2021 05:59:41 GMT
< server: istio-envoy
< x-envoy-upstream-service-time: 2
<
* Connection #0 to host bert-v2-predictor-default.default.192.168.1.235.xip.io left intact
{"error":"Request for unknown model: 'bert-v2' is not found"}
Same http 400 comes for /v2/models
above.
$ kubectl get inferenceservices --all-namespaces
NAMESPACE NAME URL READY AGE
default bert-v2 http://bert-v2.default.192.168.1.235.xip.io True 4d7h
default triton-simple-string http://triton-simple-string.default.192.168.1.235.xip.io True 7h27m
kfserving-test sklearn-iris http://sklearn-iris.kfserving-test.192.168.1.235.xip.io True 8d
$ kubectl describe inferenceservice bert-v2
Name: bert-v2
Namespace: default
Labels: <none>
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"serving.kubeflow.org/v1beta1","kind":"InferenceService","metadata":{"annotations":{"sidecar.istio.io/inject":"true"},"name"...
sidecar.istio.io/inject: true
API Version: serving.kubeflow.org/v1beta1
Kind: InferenceService
Metadata:
Creation Timestamp: 2021-01-03T23:07:55Z
Finalizers:
inferenceservice.finalizers
Generation: 1
Resource Version: 5655556
Self Link: /apis/serving.kubeflow.org/v1beta1/namespaces/default/inferenceservices/bert-v2
UID: 08b2d11e-15f2-48ac-8bf7-0096d5e0959c
Spec:
Predictor:
Triton:
Name: kfserving-container
Resources:
Limits:
Cpu: 1
Memory: 8Gi
Requests:
Cpu: 1
Memory: 8Gi
Runtime Version: 20.10-py3
Storage Uri: gs://kfserving-examples/models/triton/bert
Status:
Address:
URL: http://bert-v2.default.svc.cluster.local/v2/models/bert-v2/infer
Components:
Predictor:
Address:
URL: http://bert-v2-predictor-default.default.svc.cluster.local
Latest Created Revision: bert-v2-predictor-default-00004
Latest Ready Revision: bert-v2-predictor-default-00004
Previous Ready Revision: bert-v2-predictor-default-00003
Traffic Percent: 100
URL: http://bert-v2-predictor-default.default.192.168.1.235.xip.io
Conditions:
Last Transition Time: 2021-01-06T17:36:36Z
Status: True
Type: IngressReady
Last Transition Time: 2021-01-06T17:36:36Z
Severity: Info
Status: True
Type: PredictorConfigurationReady
Last Transition Time: 2021-01-06T17:36:36Z
Status: True
Type: PredictorReady
Last Transition Time: 2021-01-06T17:36:36Z
Severity: Info
Status: True
Type: PredictorRouteReady
Last Transition Time: 2021-01-06T17:36:36Z
Status: True
Type: Ready
URL: http://bert-v2.default.192.168.1.235.xip.io
Events: <none>
$ kubectl describe pods bert-v2
Name: bert-v2-predictor-default-00004-deployment-86d4dc64fc-sgc6s
Namespace: default
Priority: 0
Node: masternode01/192.168.1.133
Start Time: Thu, 07 Jan 2021 19:51:31 +0300
Labels: app=bert-v2-predictor-default-00004
component=predictor
istio.io/rev=default
pod-template-hash=86d4dc64fc
security.istio.io/tlsMode=istio
service.istio.io/canonical-name=bert-v2-predictor-default
service.istio.io/canonical-revision=bert-v2-predictor-default-00004
serving.knative.dev/configuration=bert-v2-predictor-default
serving.knative.dev/configurationGeneration=4
serving.knative.dev/revision=bert-v2-predictor-default-00004
serving.knative.dev/revisionUID=363adc4d-bc9e-42b9-80c4-11d7ee7b90d7
serving.knative.dev/service=bert-v2-predictor-default
serving.kubeflow.org/inferenceservice=bert-v2
Annotations: autoscaling.knative.dev/class: kpa.autoscaling.knative.dev
autoscaling.knative.dev/minScale: 1
internal.serving.kubeflow.org/storage-initializer-sourceuri: gs://kfserving-examples/models/triton/bert
prometheus.io/path: /stats/prometheus
prometheus.io/port: 15020
prometheus.io/scrape: true
serving.knative.dev/creator: system:serviceaccount:kfserving-system:default
sidecar.istio.io/inject: true
sidecar.istio.io/status:
{"version":"8e6e902b765af607513b28d284940ee1421e9a0d07698741693b2663c7161c11","initContainers":["istio-init"],"containers":["istio-proxy"]...
Status: Running
IP: 10.244.0.83
IPs:
IP: 10.244.0.83
Controlled By: ReplicaSet/bert-v2-predictor-default-00004-deployment-86d4dc64fc
Init Containers:
storage-initializer:
Container ID: docker://9864e28d8a37026689b7fb9bc1f766c9529f1af8de294515ce28f8b869ea9524
Image: gcr.io/kfserving/storage-initializer:v0.5.0-rc1
Image ID: docker-pullable://gcr.io/kfserving/storage-initializer@sha256:bd5ad7ca7a42c127f046362dcf3ab48db0475438544911fb04b29b253c6ebdcd
Port: <none>
Host Port: <none>
Args:
gs://kfserving-examples/models/triton/bert
/mnt/models
State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 07 Jan 2021 19:51:32 +0300
Finished: Thu, 07 Jan 2021 20:11:07 +0300
Ready: True
Restart Count: 0
Limits:
cpu: 1
memory: 1Gi
Requests:
cpu: 100m
memory: 100Mi
Environment: <none>
Mounts:
/mnt/models from kfserving-provision-location (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-lxw7l (ro)
istio-init:
Container ID: docker://8e0ffedcbd57a84a19de38290dbc8194623058d19a18ec5913ea31490d8424f0
Image: docker.io/istio/proxyv2:1.7.1
Image ID: docker-pullable://istio/proxyv2@sha256:4b6f682755956a957fd81a60ef246db79dae747278ec240752feb2c13135f322
Port: <none>
Host Port: <none>
Args:
istio-iptables
-p
15001
-z
15006
-u
1337
-m
REDIRECT
-i
*
-x
-b
*
-d
15090,15021,15020
State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 07 Jan 2021 20:11:10 +0300
Finished: Thu, 07 Jan 2021 20:11:10 +0300
Ready: True
Restart Count: 0
Limits:
cpu: 2
memory: 1Gi
Requests:
cpu: 10m
memory: 10Mi
Environment:
DNS_AGENT:
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-lxw7l (ro)
Containers:
kfserving-container:
Container ID: docker://f25d56d871c246227576240c21b14d6ed30018feed916aae92a187a02566cd49
Image: nvcr.io/nvidia/tritonserver:20.10-py3
Image ID: docker-pullable://nvcr.io/nvidia/tritonserver@sha256:28a458eac4d888329c9a7420032f52be27fd75fef670c99c598bca76433341c0
Port: 8080/TCP
Host Port: 0/TCP
Args:
tritonserver
--model-store=/mnt/models
--grpc-port=9000
--http-port=8080
--allow-grpc=true
--allow-http=true
State: Running
Started: Thu, 07 Jan 2021 20:11:11 +0300
Ready: True
Restart Count: 0
Limits:
cpu: 1
memory: 8Gi
Requests:
cpu: 1
memory: 8Gi
Environment:
PORT: 8080
K_REVISION: bert-v2-predictor-default-00004
K_CONFIGURATION: bert-v2-predictor-default
K_SERVICE: bert-v2-predictor-default
Mounts:
/mnt/models from kfserving-provision-location (ro)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-lxw7l (ro)
queue-proxy:
Container ID: docker://2e7ad32448481943779df7fc770d43292efe0357638db897a40fa078464dd246
Image: gcr.io/knative-releases/knative.dev/serving/cmd/queue@sha256:1a569afd4c34e285f6d647633925e2b684899bc8d01b4894047c90b75ca49357
Image ID: docker-pullable://gcr.io/knative-releases/knative.dev/serving/cmd/queue@sha256:1a569afd4c34e285f6d647633925e2b684899bc8d01b4894047c90b75ca49357
Ports: 8022/TCP, 9090/TCP, 9091/TCP, 8012/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP
State: Running
Started: Thu, 07 Jan 2021 20:11:11 +0300
Ready: True
Restart Count: 0
Requests:
cpu: 25m
Readiness: exec [/ko-app/queue -probe-period 0] delay=0s timeout=10s period=10s #success=1 #failure=3
Environment:
SERVING_NAMESPACE: default
SERVING_SERVICE: bert-v2-predictor-default
SERVING_CONFIGURATION: bert-v2-predictor-default
SERVING_REVISION: bert-v2-predictor-default-00004
QUEUE_SERVING_PORT: 8012
CONTAINER_CONCURRENCY: 0
REVISION_TIMEOUT_SECONDS: 300
SERVING_POD: bert-v2-predictor-default-00004-deployment-86d4dc64fc-sgc6s (v1:metadata.name)
SERVING_POD_IP: (v1:status.podIP)
SERVING_LOGGING_CONFIG: {
"level": "info",
"development": false,
"outputPaths": ["stdout"],
"errorOutputPaths": ["stderr"],
"encoding": "json",
"encoderConfig": {
"timeKey": "ts",
"levelKey": "level",
"nameKey": "logger",
"callerKey": "caller",
"messageKey": "msg",
"stacktraceKey": "stacktrace",
"lineEnding": "",
"levelEncoder": "",
"timeEncoder": "iso8601",
"durationEncoder": "",
"callerEncoder": ""
}
}
SERVING_LOGGING_LEVEL:
SERVING_REQUEST_LOG_TEMPLATE: {"httpRequest": {"requestMethod": "{{.Request.Method}}", "requestUrl": "{{js .Request.RequestURI}}", "requestSize": "{{.Request.ContentLength}}", "status": {{.Response.Code}}, "responseSize": "{{.Response.Size}}", "userAgent": "{{js .Request.UserAgent}}", "remoteIp": "{{js .Request.RemoteAddr}}", "serverIp": "{{.Revision.PodIP}}", "referer": "{{js .Request.Referer}}", "latency": "{{.Response.Latency}}s", "protocol": "{{.Request.Proto}}"}, "traceId": "{{index .Request.Header "X-B3-Traceid"}}"}
SERVING_ENABLE_REQUEST_LOG: false
SERVING_REQUEST_METRICS_BACKEND: prometheus
TRACING_CONFIG_BACKEND: none
TRACING_CONFIG_ZIPKIN_ENDPOINT:
TRACING_CONFIG_STACKDRIVER_PROJECT_ID:
TRACING_CONFIG_DEBUG: false
TRACING_CONFIG_SAMPLE_RATE: 0.1
USER_PORT: 8080
SYSTEM_NAMESPACE: knative-serving
METRICS_DOMAIN: knative.dev/internal/serving
SERVING_READINESS_PROBE: {"tcpSocket":{"port":8080,"host":"127.0.0.1"},"successThreshold":1}
ENABLE_PROFILING: false
SERVING_ENABLE_PROBE_REQUEST_LOG: false
METRICS_COLLECTOR_ADDRESS:
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-lxw7l (ro)
istio-proxy:
Container ID: docker://219f18661c649f16b27ca2422abd20a8f1c5c8b5ab665a8abcba7f44e6e2c94d
Image: docker.io/istio/proxyv2:1.7.1
Image ID: docker-pullable://istio/proxyv2@sha256:4b6f682755956a957fd81a60ef246db79dae747278ec240752feb2c13135f322
Port: 15090/TCP
Host Port: 0/TCP
Args:
proxy
sidecar
--domain
$(POD_NAMESPACE).svc.cluster.local
--serviceCluster
bert-v2-predictor-default-00004.$(POD_NAMESPACE)
--proxyLogLevel=warning
--proxyComponentLogLevel=misc:error
--trust-domain=cluster.local
--concurrency
2
State: Running
Started: Thu, 07 Jan 2021 20:11:13 +0300
Ready: True
Restart Count: 0
Limits:
cpu: 2
memory: 1Gi
Requests:
cpu: 100m
memory: 128Mi
Readiness: http-get http://:15021/healthz/ready delay=1s timeout=1s period=2s #success=1 #failure=30
Environment:
JWT_POLICY: first-party-jwt
PILOT_CERT_PROVIDER: istiod
CA_ADDR: istiod.istio-system.svc:15012
POD_NAME: bert-v2-predictor-default-00004-deployment-86d4dc64fc-sgc6s (v1:metadata.name)
POD_NAMESPACE: default (v1:metadata.namespace)
INSTANCE_IP: (v1:status.podIP)
SERVICE_ACCOUNT: (v1:spec.serviceAccountName)
HOST_IP: (v1:status.hostIP)
CANONICAL_SERVICE: (v1:metadata.labels['service.istio.io/canonical-name'])
CANONICAL_REVISION: (v1:metadata.labels['service.istio.io/canonical-revision'])
PROXY_CONFIG: {"proxyMetadata":{"DNS_AGENT":""}}
ISTIO_META_POD_PORTS: [
{"name":"user-port","containerPort":8080,"protocol":"TCP"}
,{"name":"http-queueadm","containerPort":8022,"protocol":"TCP"}
,{"name":"http-autometric","containerPort":9090,"protocol":"TCP"}
,{"name":"http-usermetric","containerPort":9091,"protocol":"TCP"}
,{"name":"queue-port","containerPort":8012,"protocol":"TCP"}
]
ISTIO_META_APP_CONTAINERS: kfserving-container,queue-proxy
ISTIO_META_CLUSTER_ID: Kubernetes
ISTIO_META_INTERCEPTION_MODE: REDIRECT
ISTIO_METAJSON_ANNOTATIONS: {"autoscaling.knative.dev/class":"kpa.autoscaling.knative.dev","autoscaling.knative.dev/minScale":"1","internal.serving.kubeflow.org/storage-initializer-sourceuri":"gs://kfserving-examples/models/triton/bert","serving.knative.dev/creator":"system:serviceaccount:kfserving-system:default","sidecar.istio.io/inject":"true"}
ISTIO_META_WORKLOAD_NAME: bert-v2-predictor-default-00004-deployment
ISTIO_META_OWNER: kubernetes://apis/apps/v1/namespaces/default/deployments/bert-v2-predictor-default-00004-deployment
ISTIO_META_MESH_ID: cluster.local
DNS_AGENT:
ISTIO_KUBE_APP_PROBERS: {}
Mounts:
/etc/istio/pod from istio-podinfo (rw)
/etc/istio/proxy from istio-envoy (rw)
/var/lib/istio/data from istio-data (rw)
/var/run/secrets/istio from istiod-ca-cert (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-lxw7l (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
default-token-lxw7l:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-lxw7l
Optional: false
kfserving-provision-location:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
istio-envoy:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: <unset>
istio-data:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
istio-podinfo:
Type: DownwardAPI (a volume populated by information about the pod)
Items:
metadata.labels -> labels
metadata.annotations -> annotations
istiod-ca-cert:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: istio-ca-root-cert
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events: <none>
Issue Analytics
- State:
- Created 3 years ago
- Comments:14 (5 by maintainers)
Top Results From Across the Web
no response from model, Triton server example #1283 - GitHub
I have managed to make Bert example's Triton deployment start functioning as seen in image with curl -v http://bert-v2-predictor-default.default ...
Read more >AI inference with NVIDIA Triton Server | BRKFP04 - YouTube
Join us to see how Azure Cognitive Services utilize NVIDIA Triton Inference Server for inference at scale. We highlight two use cases: ...
Read more >Triton Inference Server - tritonserver: not found - Stack Overflow
Looks like you're trying to run a tritonserver using a pytorch image but according to the triton-server quick start guide, the image should ......
Read more >Achieve hyperscale performance for model serving using ...
The end result is long response times for these model ensembles and a poor ... NVIDIA Triton Inference Server is an open-source inference ......
Read more >NVIDIA Triton Inference Server
Triton enables teams to deploy any AI model from multiple deep learning and machine learning frameworks, including TensorRT, TensorFlow, PyTorch, ONNX, ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@ontheway16 I figured out the endpoint is wrong, the transformer still works with v1 protocol while triton predictor speaks v2 inference protocol, will fix the doc, thanks for pointing out the issue!
If you try this endpoint it should work
curl -v -H "Host: ${SERVICE_HOSTNAME}" -d $INPUT_PATH http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/$MODEL_NAME:predict
@yuzisun, I made some edits on readme.md and added above graph, as .png file. Can you please review? Sorry at the moment I am not sure about PR process, so copied the modified bert folder to google drive. You can alter/remove any text.
https://drive.google.com/drive/folders/1yMFtZW1Fs21HzK_WE_CJG6qGCIdQatXl?usp=sharing