Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Changing batch size between requests with shared memory fails

See original GitHub issue

When running two clients with different batch sizes after each other, I get server errors on running inference (using shared memory) regarding the expected byte sizes.

Everything works well as long as batch sizes are consistent between clients or when the client is the very first client, but whenever a client has already set a batch size on the server before and the batch size differs for the currently requested one, I run into something similar to this:

Client error:

Server error message: expected buffer size to be 2945760bytes but gets 5891520 bytes in output tensor

Server error:

[trtserver.cc:1212] Infer failed: expected buffer size to be 2945760bytes but gets 5891520 bytes in output tensor

This happens when trying batch size 16, after another client successfully ran batch size 8, is done and unregistered his shared memory. So registering the shared memory with the new size seems to be okay, but running inference fails. All clients use the same model of course.

Issue Analytics

State:
Created 4 years ago
Comments:7 (4 by maintainers)

Top GitHub Comments

1reaction

CoderHamcommented, Sep 12, 2019

Fixed the issue with GRPC failure when using different batch sizes with the aforementioned PR. Thank you for bringing it to our attention.

0reactions

CoderHamcommented, Sep 10, 2019

Thanks @philipp-schmidt. Just re-visited this. (Had earlier tested with HTTP and it worked fine) You were right. There was an issue on the GRPC server. Fixed in grpc_server.cc.

Summary of change here.