question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

how multiple instances on the same device can be concurrent

See original GitHub issue

After reading the code, I was confused about how multiple instances on the same device can be concurrent

Multiple instances on the same device share a TritonBackendThread object, in src/backends/backend/triton_model_instance.cc 图片

model_instances_ save all instances on this device 图片

In the function, TritonModelInstance::TritonBackendThread::BackendThread, 图片

My question is, if there are two instances of model x, A and B on device 0 model_->Server()->GetRateLimiter()->DequeuePayload(model_instances_, &payload); Obtain the payload and assume instance A is assigned to the payload, then, ‘payload->Execute()’, start to forward(),

instance B cannot be assigned payload and execution until the instance A is completed?

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

3reactions
tanmayv25commented, May 5, 2022

@FreshZZ Thanks for asking this question. The backend thread sharing is only implemented in case of using GPU instance and with device_blocking execution policy enabled by the backend. See this line: https://github.com/triton-inference-server/core/blob/main/src/backend_model_instance.cc#L314-321 If device_blocking is set false then each TritonModelInstance will create its own triton_backend_thread_. Hence achieve full concurrency.

See here how Triton core detects that backend has device_blocking execution policy set: https://github.com/triton-inference-server/core/blob/main/src/backend_model.cc#L175-L185

Read more about Device blocking execution policy here: https://github.com/triton-inference-server/core/blob/main/include/triton/core/tritonbackend.h#L781-L794

The behavior of a backend requesting device blocking execution policy is as per what you observed. At present only TensorRT backend uses the Device Blocking execution policy. This is because the execution itself is asynchronous. The implementation of backend is such that even if using only a single backend thread, the backend is able to run mulitple inference requests concurrently.

2reactions
FreshZZcommented, May 7, 2022

@tanmayv25 Thank you so much

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to handle multiple concurrent instances at the same time?
The first thing your program should do is try to create a shared memory segment (using your own made up key) and store...
Read more >
Run multiple concurrent UI flows on a single Windows Server ...
Use two or more user accounts to create UI Flows connections targeting the gateway on this machine. You can now run multiple UI...
Read more >
Maximum concurrent requests per instance (services)
By default each Cloud Run container instance can receive up to 80 requests at the same time; you can increase this to a...
Read more >
How to Install Multiple Copies and Run Multiple Instances of ...
Here is how you can run multiple instances of an app using Parallel Space: Open Parallel Space and tap on the apps you...
Read more >
Is it possible to have several people on my team connected to ...
Yes. Multiple people can connect to a single instance (concurrent device ). Note that every person connected to the instance will see the......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found