question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Shared Memory client fails for batch size != 1

See original GitHub issue

Requests with batch size 1 and correct input_byte_size calculation work correctly.

Increasing batch size to two and multiply input_byte_size by two leads to the following exception (modified version of simple_shm_client.cc):

int batch_size = 2;
options->SetBatchSize(batch_size);
size_t input_byte_size = 608 * 608 * 3 * sizeof(float) * batch_size;

failed setting shared memory input: [ 0] INVALID_ARG - The input ‘000_net’ has shared memory of size 8871936 bytes while the expected size is 4435968 bytes

So the expected size does not take batch size into account. Matching the expected size then fails during a different sanity check:

int batch_size = 2;
options->SetBatchSize(batch_size);
size_t input_byte_size = 608 * 608 * 3 * sizeof(float);

error: unable to run model: [inference:0 6] INVALID_ARG - unexpected shared-memory size 4435968 for input ‘000_net’, expecting 8871936 for model ‘yolov3’

So now batch size is taken into account for the expected byte size.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
CoderHamcommented, Aug 12, 2019

The fix to this is simple. I shall test it and create a new PR for it.

1reaction
philipp-schmidtcommented, Aug 13, 2019

Thanks, very much appreciated @CoderHam !

Read more comments on GitHub >

github_iconTop Results From Across the Web

Shared Memory client fails for batch size != 1 #544 - GitHub
The error shows, that while I set the correct input_byte_size of the full batch (~8.8MB), the client sanity checks always expect batch size...
Read more >
Why am I getting memory allocation error even on batch size ...
My model crashes with memory allocation error on tensor [1,16,1536,1536]. Using the equation given in the article above I've calculated the ...
Read more >
Shared Memory Problem (unable to allocate ... - Ask TOM
Bind variables are SO MASSIVELY important -- I cannot in any way shape or form OVERSTATE their importance. Same with the PLSQL call...
Read more >
Troubleshooting TensorFlow - TPU - Google Cloud
Batch size or model too large. Possible Cause of Memory Issue. When training a neural network on a CPU, GPU, or TPU, the...
Read more >
CUDA C++ Best Practices Guide
Code samples throughout the guide omit error checking for conciseness. ... memory and the device memories of all installed supported devices share a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found