Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Understanding Trace File

See original GitHub issue

Hey,

Basic Info

We are trying to create a production ready triton container and get its request logs. For that we found the trace triton configurations that give us exactly what we want. We are trying to achieve metrics info about each request sent to the server. The reason we are not using the prometheus & /metrics route will be written by my co-worker in the next comment in this Issue (questions).

Usage Info

We raise the container on k8s with a logstash sidecar container that reads the trace json file and send it to elastic (each logstash read will delete what it read from the json file in order to not have an infinitely growing file).

We have some questions about the intention of this feature and its functionality:

TRITON VERSION -> 22.02-py3 -> Documentation -> Trace Configurations

$ tritonserver --trace-file=/tmp/trace.json --trace-rate=100 --trace-level=TIMESTAMPS ...

TRITON VERSION -> 21.12-py3 -> Documentation -> Trace json file structure

[
  {
    "model_name": $string,
    "model_version": $number,
    "id": $number
    "parent_id": $number,
    "timestamps": [
      { "name" : $string, "ns" : $number },
      ...
    ]
  },
  ...
]
----------------------------------------------------------------------------
[
  {
    "model_name": "simple",
    "model_version": -1,
    "id":1,
    "timestamps" : [
      { "name": "http recv start", "ns": 2259961222771924 },
      { "name": "http recv end", "ns": 2259961222820985 },
      { "name": "request handler start", "ns": 2259961223164078 },
      { "name": "queue start", "ns": 2259961223182400 },
      { "name": "compute start", "ns": 2259961223232405 },
      { "name": "compute end", "ns": 2259961230206777 },
      { "name": "request handler end", "ns": 2259961230211887 },
      { "name": "http send start", "ns": 2259961230529606 },
      { "name": "http send end", "ns": 2259961230543930 }
    ]
  }
]

In the docs you set the trace-rate to 100 and I think the default is either 1000 or 100. As said above, we want the data given by this feature per request (setting the trace-rate to 1), Is this the correct use for the intent of the feature? (as said above logstash will delete the read data from the file -> sort of a retention policy).
We have encountered a mysterious thing (triton 21.12-py3/21.09-py3), the logs of this trace feature aren’t written to the trace json file until the tritonserver is exited (not the container, the command). That makes no sense to us and we found a parameter that can be configured only in the 22.02-py3 triton version (not checked the 22.01-py3) -> the trace-log-frequency parameter which would be the equivalent of a bulk setting, the problem is we cant find how to set it in the 21.09/21.12 versions. Furthermore, what would you advise the correct amount set it would be considering the IO involved and how does the IO effect the availability of the server if at all?
The previous question raises an interesting point about the features added in Triton per version. I have a pytorch model but the pytorch backend that has the pytorch version I want is supported only up until Triton version 21.09-py3, but at the same time I obviously want this insane feature. The only thing we have thought to do is maybe set the --backend-directory parameter that holds the pytorch backend directory given in 21.09-py3 backends directory, but that also has its possible future issues, the main one which is that by my understanding the pytorch backends uses the python backend python stub which makes its dependencies set by the default python version set on the server (which coincidentally I feel is 3.8 between all the versions I have used so far, but that doesn’t fell future proof). It feels weird that extra features like this come at an expense of backends versioning.
As shown in docs, the json file data structure is great. We have tried it in the 21.12-py3 version and we have gotten a different result, a usable one but not as good and nice to work with as the one in the docs.

[
  {
    "id": $number,
    "model_name": $string
  },
  {
    "id": $number,
    "model_version": $number
  },
  {
    "id": $number,
    "http recv start": $number
  },
  ...
]

This is the structure we receive, the id will be the same per request traced so we can achieve the structure shown in the docs, but maybe it is an error or we have forgotten to set something. What should we do?

Issue Analytics

State:
Created 2 years ago
Comments:15 (10 by maintainers)

Top GitHub Comments

1reaction

GuanLuocommented, Mar 16, 2022

Yes, the trace documentation needs to be updated, the trace log structure is no longer the same as what is described in the documentation, @rmccorm4 can you file a ticket to track this?

@bro-adm The backend is decoupled from Triton core, so you should be able to change the backend to the version that you intend to use while using the newer version of Triton. Of course you need to satisfy the backend requirement as you described, but Triton itself doesn’t have too many external dependencies, so you can probably copy Triton to the environment that the desired backend can run on.

0reactions

brightsparccommented, Jun 27, 2022

Hi, Is there any update on this feature request?