Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Submitting raw data via IPC

See original GitHub issue

Hi, thanks for open-sourcing this project!

I experimented with the TensorRT inference server, and I found that with my target model (a TensorRT execution plan that has FP16 inputs and outputs) to max-out my system’s two GPUsm I need to send about 1.2 GBytes per second through the network stack. In my view, this means that scaling this architecture to a server with eight (or even more) GPUs either requires (multiple) IB interconnects, or a preprocessor which is co-located with the inference server, which receives compressed images, and sends raw data to the TRT server.

Once we assume that a preprocessor is located on the same physical node as the TRT inference server (and hope that the CPUs does not become a bottleneck now), then it would be much preferable to submit raw data via IPC (e.g. through /dev/shm) to the inference server, and thus avoid the overhead introduced by gRPC.

Here are my questions:

Is the above assessment and the conclusions I draw from it reasonable?
Do you have “submission of raw data via IPC mechanisms” on your roadmap? E.g. a feature where one submits a reference to the blob of preprocessed data in shared memory to the server via gRPC, and the server then loads this blob and uses it as input. If so, when do you plan on releasing it?
If I were to implement a version of this myself, do you agree that a first quick-and-dirty approach would be to a) change the gRPC service proto, and then b) change GRPCInferRequestProvider::GetNextInputContent in tensorrt-inference-server/src/core/infer.cc accordingly? Did I overlook a place where changes are necessary?

Again, thanks for making this tool available.

Issue Analytics

State:
Created 5 years ago
Comments:17 (9 by maintainers)

Top GitHub Comments

6reactions

deadeyegoodwincommented, Jul 23, 2019

We have just started work on implementing a shared-memory AP (option C). Changes will start to come into master and we expect to have an initial minimal implementation in about 3 weeks. The API will allow input and output tensors to be passed to/from TRTIS via shared-memory instead of over the network. It will be the responsibility of an outside “agent” to create and manage the lifetime of the shared-memory regions. TRTIS will provide APIs that allow that “agent” to register/unregister these shared memory regions with TRTIS and then they can be used in inference requests.

3reactions

deadeyegoodwincommented, Aug 7, 2019

The master branch now has the initial implementation for shared memory support for input tensors and some minimal testing.

Currently only the C++ client API supports shared memory (Python support is TBD… but you can always use grpc to generate client code for many languages). The C++ API changes are here: https://github.com/NVIDIA/tensorrt-inference-server/commit/6d33c8ca8cf5ec7eece925bb997d7f81df6caabe#diff-906ebe14e6f98b22609d12ac8433acc0

An example application is: https://github.com/NVIDIA/tensorrt-inference-server/blob/master/src/clients/c%2B%2B/simple_shm_client.cc. The L0_simple_shared_memory_example test performs some minimal testing using that example application.

Top Results From Across the Web

Submitting raw data via IPC · Issue #1 · triton-inference-server ...

Do you have "submission of raw data via IPC mechanisms" on your roadmap? E.g. a feature where one submits a reference to the...

A Tutorial on Shared Memory Inter-Process Communication

A hands on tutorial to use Shared Memory to do Inter-Process communication. We will implement a C++ shared memory IPC system together.

NodeJs/Electron : How to wait for function to finish before ...

In short, I want my treat function to finish before the ipcMain event send back a message. Here's just an example of where...

Streaming, Serialization, and IPC — Apache Arrow v10.0.1

When writing and reading raw Arrow data, we can use the Arrow File Format or the Arrow Streaming Format. To dump an array...

Concurrent programming with Boost using IPC and MPI libraries

This article introduces the IPC and MPI libraries along with some ... The send method signature has three inputs: a pointer to the...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Submitting raw data via IPC

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

How to deploy models where the shape of output tensor is not known

Assumption message for Skipped scenario