Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

no response from model, Triton server example

See original GitHub issue

Istio Version: v1.7.1 Knative Version: v0.19 Kubernetes version: v1.17

I have managed to make Bert example’s Triton deployment start functioning as seen in image with curl -v http://bert-v2-predictor-default.default.192.168.1.235.xip.io/v2, but could not get any other response, either by /v2/models or /v2/models/bert-v2. Since inference service status is READY True, I am assuming theres no problem with locating/loading models. What else I can do to further investigate?

triton_server

Some status info;

$ curl -v http://bert-v2-predictor-default.default.192.168.1.235.xip.io/v2
*   Trying 192.168.1.235...
* TCP_NODELAY set
* Connected to bert-v2-predictor-default.default.192.168.1.235.xip.io (192.168.1.235) port 80 (#0)
> GET /v2 HTTP/1.1
> Host: bert-v2-predictor-default.default.192.168.1.235.xip.io
> User-Agent: curl/7.58.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< content-length: 215
< content-type: application/json
< date: Fri, 08 Jan 2021 05:58:06 GMT
< server: istio-envoy
< x-envoy-upstream-service-time: 2
< 
* Connection #0 to host bert-v2-predictor-default.default.192.168.1.235.xip.io left intact
{"name":"triton","version":"2.4.0","extensions":["classification","sequence","model_repository","schedule_policy","model_configuration","system_shared_memory","cuda_shared_memory","binary_tensor_data","statistics"]}

$ curl -v http://bert-v2-predictor-default.default.192.168.1.235.xip.io/v2/models/bert-v2
*   Trying 192.168.1.235...
...
...
< HTTP/1.1 400 Bad Request
< content-length: 61
< content-type: application/json
< date: Fri, 08 Jan 2021 05:59:41 GMT
< server: istio-envoy
< x-envoy-upstream-service-time: 2
< 
* Connection #0 to host bert-v2-predictor-default.default.192.168.1.235.xip.io left intact
{"error":"Request for unknown model: 'bert-v2' is not found"}

Same http 400 comes for /v2/models above.

$ kubectl get inferenceservices --all-namespaces
NAMESPACE        NAME                   URL                                                        READY   AGE
default          bert-v2                http://bert-v2.default.192.168.1.235.xip.io                True    4d7h
default          triton-simple-string   http://triton-simple-string.default.192.168.1.235.xip.io   True    7h27m
kfserving-test   sklearn-iris           http://sklearn-iris.kfserving-test.192.168.1.235.xip.io    True    8d

$ kubectl describe inferenceservice bert-v2
Name:         bert-v2
Namespace:    default
Labels:       <none>
Annotations:  kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"serving.kubeflow.org/v1beta1","kind":"InferenceService","metadata":{"annotations":{"sidecar.istio.io/inject":"true"},"name"...
              sidecar.istio.io/inject: true
API Version:  serving.kubeflow.org/v1beta1
Kind:         InferenceService
Metadata:
  Creation Timestamp:  2021-01-03T23:07:55Z
  Finalizers:
    inferenceservice.finalizers
  Generation:        1
  Resource Version:  5655556
  Self Link:         /apis/serving.kubeflow.org/v1beta1/namespaces/default/inferenceservices/bert-v2
  UID:               08b2d11e-15f2-48ac-8bf7-0096d5e0959c
Spec:
  Predictor:
    Triton:
      Name:  kfserving-container
      Resources:
        Limits:
          Cpu:     1
          Memory:  8Gi
        Requests:
          Cpu:          1
          Memory:       8Gi
      Runtime Version:  20.10-py3
      Storage Uri:      gs://kfserving-examples/models/triton/bert
Status:
  Address:
    URL:  http://bert-v2.default.svc.cluster.local/v2/models/bert-v2/infer
  Components:
    Predictor:
      Address:
        URL:                    http://bert-v2-predictor-default.default.svc.cluster.local
      Latest Created Revision:  bert-v2-predictor-default-00004
      Latest Ready Revision:    bert-v2-predictor-default-00004
      Previous Ready Revision:  bert-v2-predictor-default-00003
      Traffic Percent:          100
      URL:                      http://bert-v2-predictor-default.default.192.168.1.235.xip.io
  Conditions:
    Last Transition Time:  2021-01-06T17:36:36Z
    Status:                True
    Type:                  IngressReady
    Last Transition Time:  2021-01-06T17:36:36Z
    Severity:              Info
    Status:                True
    Type:                  PredictorConfigurationReady
    Last Transition Time:  2021-01-06T17:36:36Z
    Status:                True
    Type:                  PredictorReady
    Last Transition Time:  2021-01-06T17:36:36Z
    Severity:              Info
    Status:                True
    Type:                  PredictorRouteReady
    Last Transition Time:  2021-01-06T17:36:36Z
    Status:                True
    Type:                  Ready
  URL:                     http://bert-v2.default.192.168.1.235.xip.io
Events:                    <none>

$ kubectl describe pods bert-v2
Name:         bert-v2-predictor-default-00004-deployment-86d4dc64fc-sgc6s
Namespace:    default
Priority:     0
Node:         masternode01/192.168.1.133
Start Time:   Thu, 07 Jan 2021 19:51:31 +0300
Labels:       app=bert-v2-predictor-default-00004
              component=predictor
              istio.io/rev=default
              pod-template-hash=86d4dc64fc
              security.istio.io/tlsMode=istio
              service.istio.io/canonical-name=bert-v2-predictor-default
              service.istio.io/canonical-revision=bert-v2-predictor-default-00004
              serving.knative.dev/configuration=bert-v2-predictor-default
              serving.knative.dev/configurationGeneration=4
              serving.knative.dev/revision=bert-v2-predictor-default-00004
              serving.knative.dev/revisionUID=363adc4d-bc9e-42b9-80c4-11d7ee7b90d7
              serving.knative.dev/service=bert-v2-predictor-default
              serving.kubeflow.org/inferenceservice=bert-v2
Annotations:  autoscaling.knative.dev/class: kpa.autoscaling.knative.dev
              autoscaling.knative.dev/minScale: 1
              internal.serving.kubeflow.org/storage-initializer-sourceuri: gs://kfserving-examples/models/triton/bert
              prometheus.io/path: /stats/prometheus
              prometheus.io/port: 15020
              prometheus.io/scrape: true
              serving.knative.dev/creator: system:serviceaccount:kfserving-system:default
              sidecar.istio.io/inject: true
              sidecar.istio.io/status:
                {"version":"8e6e902b765af607513b28d284940ee1421e9a0d07698741693b2663c7161c11","initContainers":["istio-init"],"containers":["istio-proxy"]...
Status:       Running
IP:           10.244.0.83
IPs:
  IP:           10.244.0.83
Controlled By:  ReplicaSet/bert-v2-predictor-default-00004-deployment-86d4dc64fc
Init Containers:
  storage-initializer:
    Container ID:  docker://9864e28d8a37026689b7fb9bc1f766c9529f1af8de294515ce28f8b869ea9524
    Image:         gcr.io/kfserving/storage-initializer:v0.5.0-rc1
    Image ID:      docker-pullable://gcr.io/kfserving/storage-initializer@sha256:bd5ad7ca7a42c127f046362dcf3ab48db0475438544911fb04b29b253c6ebdcd
    Port:          <none>
    Host Port:     <none>
    Args:
      gs://kfserving-examples/models/triton/bert
      /mnt/models
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 07 Jan 2021 19:51:32 +0300
      Finished:     Thu, 07 Jan 2021 20:11:07 +0300
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     1
      memory:  1Gi
    Requests:
      cpu:        100m
      memory:     100Mi
    Environment:  <none>
    Mounts:
      /mnt/models from kfserving-provision-location (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-lxw7l (ro)
  istio-init:
    Container ID:  docker://8e0ffedcbd57a84a19de38290dbc8194623058d19a18ec5913ea31490d8424f0
    Image:         docker.io/istio/proxyv2:1.7.1
    Image ID:      docker-pullable://istio/proxyv2@sha256:4b6f682755956a957fd81a60ef246db79dae747278ec240752feb2c13135f322
    Port:          <none>
    Host Port:     <none>
    Args:
      istio-iptables
      -p
      15001
      -z
      15006
      -u
      1337
      -m
      REDIRECT
      -i
      *
      -x
      
      -b
      *
      -d
      15090,15021,15020
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 07 Jan 2021 20:11:10 +0300
      Finished:     Thu, 07 Jan 2021 20:11:10 +0300
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     2
      memory:  1Gi
    Requests:
      cpu:     10m
      memory:  10Mi
    Environment:
      DNS_AGENT:  
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-lxw7l (ro)
Containers:
  kfserving-container:
    Container ID:  docker://f25d56d871c246227576240c21b14d6ed30018feed916aae92a187a02566cd49
    Image:         nvcr.io/nvidia/tritonserver:20.10-py3
    Image ID:      docker-pullable://nvcr.io/nvidia/tritonserver@sha256:28a458eac4d888329c9a7420032f52be27fd75fef670c99c598bca76433341c0
    Port:          8080/TCP
    Host Port:     0/TCP
    Args:
      tritonserver
      --model-store=/mnt/models
      --grpc-port=9000
      --http-port=8080
      --allow-grpc=true
      --allow-http=true
    State:          Running
      Started:      Thu, 07 Jan 2021 20:11:11 +0300
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     1
      memory:  8Gi
    Requests:
      cpu:     1
      memory:  8Gi
    Environment:
      PORT:             8080
      K_REVISION:       bert-v2-predictor-default-00004
      K_CONFIGURATION:  bert-v2-predictor-default
      K_SERVICE:        bert-v2-predictor-default
    Mounts:
      /mnt/models from kfserving-provision-location (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-lxw7l (ro)
  queue-proxy:
    Container ID:   docker://2e7ad32448481943779df7fc770d43292efe0357638db897a40fa078464dd246
    Image:          gcr.io/knative-releases/knative.dev/serving/cmd/queue@sha256:1a569afd4c34e285f6d647633925e2b684899bc8d01b4894047c90b75ca49357
    Image ID:       docker-pullable://gcr.io/knative-releases/knative.dev/serving/cmd/queue@sha256:1a569afd4c34e285f6d647633925e2b684899bc8d01b4894047c90b75ca49357
    Ports:          8022/TCP, 9090/TCP, 9091/TCP, 8012/TCP
    Host Ports:     0/TCP, 0/TCP, 0/TCP, 0/TCP
    State:          Running
      Started:      Thu, 07 Jan 2021 20:11:11 +0300
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:      25m
    Readiness:  exec [/ko-app/queue -probe-period 0] delay=0s timeout=10s period=10s #success=1 #failure=3
    Environment:
      SERVING_NAMESPACE:                      default
      SERVING_SERVICE:                        bert-v2-predictor-default
      SERVING_CONFIGURATION:                  bert-v2-predictor-default
      SERVING_REVISION:                       bert-v2-predictor-default-00004
      QUEUE_SERVING_PORT:                     8012
      CONTAINER_CONCURRENCY:                  0
      REVISION_TIMEOUT_SECONDS:               300
      SERVING_POD:                            bert-v2-predictor-default-00004-deployment-86d4dc64fc-sgc6s (v1:metadata.name)
      SERVING_POD_IP:                          (v1:status.podIP)
      SERVING_LOGGING_CONFIG:                 {
                                                "level": "info",
                                                "development": false,
                                                "outputPaths": ["stdout"],
                                                "errorOutputPaths": ["stderr"],
                                                "encoding": "json",
                                                "encoderConfig": {
                                                  "timeKey": "ts",
                                                  "levelKey": "level",
                                                  "nameKey": "logger",
                                                  "callerKey": "caller",
                                                  "messageKey": "msg",
                                                  "stacktraceKey": "stacktrace",
                                                  "lineEnding": "",
                                                  "levelEncoder": "",
                                                  "timeEncoder": "iso8601",
                                                  "durationEncoder": "",
                                                  "callerEncoder": ""
                                                }
                                              }
      SERVING_LOGGING_LEVEL:                  
      SERVING_REQUEST_LOG_TEMPLATE:           {"httpRequest": {"requestMethod": "{{.Request.Method}}", "requestUrl": "{{js .Request.RequestURI}}", "requestSize": "{{.Request.ContentLength}}", "status": {{.Response.Code}}, "responseSize": "{{.Response.Size}}", "userAgent": "{{js .Request.UserAgent}}", "remoteIp": "{{js .Request.RemoteAddr}}", "serverIp": "{{.Revision.PodIP}}", "referer": "{{js .Request.Referer}}", "latency": "{{.Response.Latency}}s", "protocol": "{{.Request.Proto}}"}, "traceId": "{{index .Request.Header "X-B3-Traceid"}}"}
      SERVING_ENABLE_REQUEST_LOG:             false
      SERVING_REQUEST_METRICS_BACKEND:        prometheus
      TRACING_CONFIG_BACKEND:                 none
      TRACING_CONFIG_ZIPKIN_ENDPOINT:         
      TRACING_CONFIG_STACKDRIVER_PROJECT_ID:  
      TRACING_CONFIG_DEBUG:                   false
      TRACING_CONFIG_SAMPLE_RATE:             0.1
      USER_PORT:                              8080
      SYSTEM_NAMESPACE:                       knative-serving
      METRICS_DOMAIN:                         knative.dev/internal/serving
      SERVING_READINESS_PROBE:                {"tcpSocket":{"port":8080,"host":"127.0.0.1"},"successThreshold":1}
      ENABLE_PROFILING:                       false
      SERVING_ENABLE_PROBE_REQUEST_LOG:       false
      METRICS_COLLECTOR_ADDRESS:              
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-lxw7l (ro)
  istio-proxy:
    Container ID:  docker://219f18661c649f16b27ca2422abd20a8f1c5c8b5ab665a8abcba7f44e6e2c94d
    Image:         docker.io/istio/proxyv2:1.7.1
    Image ID:      docker-pullable://istio/proxyv2@sha256:4b6f682755956a957fd81a60ef246db79dae747278ec240752feb2c13135f322
    Port:          15090/TCP
    Host Port:     0/TCP
    Args:
      proxy
      sidecar
      --domain
      $(POD_NAMESPACE).svc.cluster.local
      --serviceCluster
      bert-v2-predictor-default-00004.$(POD_NAMESPACE)
      --proxyLogLevel=warning
      --proxyComponentLogLevel=misc:error
      --trust-domain=cluster.local
      --concurrency
      2
    State:          Running
      Started:      Thu, 07 Jan 2021 20:11:13 +0300
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     2
      memory:  1Gi
    Requests:
      cpu:      100m
      memory:   128Mi
    Readiness:  http-get http://:15021/healthz/ready delay=1s timeout=1s period=2s #success=1 #failure=30
    Environment:
      JWT_POLICY:                    first-party-jwt
      PILOT_CERT_PROVIDER:           istiod
      CA_ADDR:                       istiod.istio-system.svc:15012
      POD_NAME:                      bert-v2-predictor-default-00004-deployment-86d4dc64fc-sgc6s (v1:metadata.name)
      POD_NAMESPACE:                 default (v1:metadata.namespace)
      INSTANCE_IP:                    (v1:status.podIP)
      SERVICE_ACCOUNT:                (v1:spec.serviceAccountName)
      HOST_IP:                        (v1:status.hostIP)
      CANONICAL_SERVICE:              (v1:metadata.labels['service.istio.io/canonical-name'])
      CANONICAL_REVISION:             (v1:metadata.labels['service.istio.io/canonical-revision'])
      PROXY_CONFIG:                  {"proxyMetadata":{"DNS_AGENT":""}}
                                     
      ISTIO_META_POD_PORTS:          [
                                         {"name":"user-port","containerPort":8080,"protocol":"TCP"}
                                         ,{"name":"http-queueadm","containerPort":8022,"protocol":"TCP"}
                                         ,{"name":"http-autometric","containerPort":9090,"protocol":"TCP"}
                                         ,{"name":"http-usermetric","containerPort":9091,"protocol":"TCP"}
                                         ,{"name":"queue-port","containerPort":8012,"protocol":"TCP"}
                                     ]
      ISTIO_META_APP_CONTAINERS:     kfserving-container,queue-proxy
      ISTIO_META_CLUSTER_ID:         Kubernetes
      ISTIO_META_INTERCEPTION_MODE:  REDIRECT
      ISTIO_METAJSON_ANNOTATIONS:    {"autoscaling.knative.dev/class":"kpa.autoscaling.knative.dev","autoscaling.knative.dev/minScale":"1","internal.serving.kubeflow.org/storage-initializer-sourceuri":"gs://kfserving-examples/models/triton/bert","serving.knative.dev/creator":"system:serviceaccount:kfserving-system:default","sidecar.istio.io/inject":"true"}
                                     
      ISTIO_META_WORKLOAD_NAME:      bert-v2-predictor-default-00004-deployment
      ISTIO_META_OWNER:              kubernetes://apis/apps/v1/namespaces/default/deployments/bert-v2-predictor-default-00004-deployment
      ISTIO_META_MESH_ID:            cluster.local
      DNS_AGENT:                     
      ISTIO_KUBE_APP_PROBERS:        {}
    Mounts:
      /etc/istio/pod from istio-podinfo (rw)
      /etc/istio/proxy from istio-envoy (rw)
      /var/lib/istio/data from istio-data (rw)
      /var/run/secrets/istio from istiod-ca-cert (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-lxw7l (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  default-token-lxw7l:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-lxw7l
    Optional:    false
  kfserving-provision-location:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  istio-envoy:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  <unset>
  istio-data:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  istio-podinfo:
    Type:  DownwardAPI (a volume populated by information about the pod)
    Items:
      metadata.labels -> labels
      metadata.annotations -> annotations
  istiod-ca-cert:
    Type:        ConfigMap (a volume populated by a ConfigMap)
    Name:        istio-ca-root-cert
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:          <none>

Issue Analytics

State:
Created 3 years ago
Comments:14 (5 by maintainers)

Top GitHub Comments

1reaction

yuzisuncommented, Jan 10, 2021

@ontheway16 I figured out the endpoint is wrong, the transformer still works with v1 protocol while triton predictor speaks v2 inference protocol, will fix the doc, thanks for pointing out the issue!

If you try this endpoint it should work curl -v -H "Host: ${SERVICE_HOSTNAME}" -d $INPUT_PATH http://${INGRESS_HOST}:${INGRESS_PORT}/v1/models/$MODEL_NAME:predict

0reactions

ontheway16commented, Jan 11, 2021

@yuzisun, I made some edits on readme.md and added above graph, as .png file. Can you please review? Sorry at the moment I am not sure about PR process, so copied the modified bert folder to google drive. You can alter/remove any text.

https://drive.google.com/drive/folders/1yMFtZW1Fs21HzK_WE_CJG6qGCIdQatXl?usp=sharing

Top Results From Across the Web

no response from model, Triton server example #1283 - GitHub

I have managed to make Bert example's Triton deployment start functioning as seen in image with curl -v http://bert-v2-predictor-default.default ...

AI inference with NVIDIA Triton Server | BRKFP04 - YouTube

Join us to see how Azure Cognitive Services utilize NVIDIA Triton Inference Server for inference at scale. We highlight two use cases: ...

Triton Inference Server - tritonserver: not found - Stack Overflow

Looks like you're trying to run a tritonserver using a pytorch image but according to the triton-server quick start guide, the image should ......

Achieve hyperscale performance for model serving using ...

The end result is long response times for these model ensembles and a poor ... NVIDIA Triton Inference Server is an open-source inference ......

NVIDIA Triton Inference Server

Triton enables teams to deploy any AI model from multiple deep learning and machine learning frameworks, including TensorRT, TensorFlow, PyTorch, ONNX, ...