question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Java index out of bounds exception when running many requests through server

See original GitHub issue

Context

Trying to loadtest a torch serve model to gauge performance on a custom handler.

  • torchserve version: 0.2.0
  • torch version: 1.6.0
  • java version: openjdk 11.0.8
  • Operating System and version: Debian via the python 3.7-buster image.

Your Environment

  • Are you planning to deploy it using docker container? [yes/no]: yes
  • Is it a CPU or GPU environment?: CPU
  • Using a default/custom handler? custom
  • What kind of model is it e.g. vision, text, audio?: feed forward for custom input.
  • Are you planning to use local models from model-store or public url being used e.g. from S3 bucket etc.? from model store
  • Provide config.properties, logs [ts.log] and parameters used for model registration/update APIs: number of netty threads=32

Expected Behavior

Expected torch serve not to throw this error or understand what properties of the environment I could change to address it. It only seems to happen on medium load.

Current Behavior

With a load of ~5rps and varying batch size and CPU memory and count allocations the server will throw an errors in ~4%+ of requests.

Failure Logs [if any]

2020-10-17 00:16:41,887 [INFO ] epollEventLoopGroup-5-3 org.pytorch.serve.wlm.WorkerThread - 9002 Worker disconnected. WORKER_MODEL_LOADED 2020-10-17 00:16:41,887 [ERROR] epollEventLoopGroup-5-3 org.pytorch.serve.wlm.WorkerThread - Unknown exception io.netty.handler.codec.DecoderException: java.lang.IndexOutOfBoundsException: readerIndex(1021) + length(4) exceeds writerIndex(1024): PooledUnsafeDirectByteBuf(ridx: 1021, widx: 1024, cap: 1024) at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:471) at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:404) at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:371) at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:354) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901) at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:818) at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: java.lang.IndexOutOfBoundsException: readerIndex(1021) + length(4) exceeds writerIndex(1024): PooledUnsafeDirectByteBuf(ridx: 1021, widx: 1024, cap: 1024) at io.netty.buffer.AbstractByteBuf.checkReadableBytes0(AbstractByteBuf.java:1477) at io.netty.buffer.AbstractByteBuf.readInt(AbstractByteBuf.java:810) at org.pytorch.serve.util.codec.ModelResponseDecoder.decode(ModelResponseDecoder.java:56) at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:501)

Thank you in advance for any help you can provide!

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:38 (16 by maintainers)

github_iconTop GitHub Comments

1reaction
punshrivcommented, Dec 15, 2020

@harshbafna I was debugging this further and observed the following: Python backend sends the complete response of all the batched request , but when the frontend server gets its , its fragemented. Example for the below scenario , for the total response size of 500777 , the Message decoder gets the fragments

2020-12-15 03:36:08,577 [INFO ] W-9000-bert-base-nli-mean-tokens-embeddings_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - #DEBUG 500777 2020-12-15 03:36:08,577 [INFO ] epollEventLoopGroup-5-1 org.pytorch.serve.util.codec.ModelResponseDecoder - #DEBUG SIZE 65536 2020-12-15 03:36:08,577 [INFO ] epollEventLoopGroup-5-1 org.pytorch.serve.util.codec.ModelResponseDecoder - #DEBUG SIZE 131072 2020-12-15 03:36:08,577 [INFO ] epollEventLoopGroup-5-1 org.pytorch.serve.util.codec.ModelResponseDecoder - #DEBUG SIZE 196608 2020-12-15 03:36:08,578 [INFO ] epollEventLoopGroup-5-1 org.pytorch.serve.util.codec.ModelResponseDecoder - #DEBUG SIZE 262144 2020-12-15 03:36:08,578 [INFO ] epollEventLoopGroup-5-1 org.pytorch.serve.util.codec.ModelResponseDecoder - #DEBUG SIZE 327680 2020-12-15 03:36:08,580 [ERROR] epollEventLoopGroup-5-1 org.pytorch.serve.wlm.WorkerThread - Unknown exception io.netty.handler.codec.DecoderException: java.lang.IndexOutOfBoundsException: readerIndex(327678) + length(4) exceeds writerIndex(327680): PooledUnsafeDirectByteBuf(ridx: 327678, widx: 327680, cap: 524288)

I suspect the issue is caused due to incorrect decoding of these fragments , What are your throught on this ? Shouldnt the reassembly of these fragments be done at a lower level and then be decoded at the application level ?

1reaction
punshrivcommented, Dec 14, 2020

@harshbafna

Input to test:

curl -X POST
http://XXXXXXXXX/predictions/distilbert-base-uncased-distilled-squad-reader/3.0
-H ‘content-type: application/json’
-d ‘{ “q_id”: “1”, “text”:“Who is the prime minister of India ?”, “content”: “Narendra Damodardas Modi is an Indian politician serving as the 14th and current Prime Minister of India since 2014” }’

This should return the following response:

{ “q_id”: “1”, “answer”: “narendra damodardas modi” }

config.properties: inference_address=http://0.0.0.0:8080 management_address=http://0.0.0.0:8081 metrics_address=http://0.0.0.0:8082 number_of_netty_threads=32 job_queue_size=1000 model_store=/home/model-server/model-store

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to Fix the Array Index Out Of Bounds Excepiton in Java
The ArrayIndexOutOfBoundsException is a runtime exception in Java that occurs when an array is accessed with an illegal index.
Read more >
Array Index Out Of Bounds Exception in Java - GeeksforGeeks
If a request for a negative or an index greater than or equal to the size of the array is made, then the...
Read more >
How to handle Array Index Out Of Bounds Exception ...
In order to avoid the exception, first, be very careful when you iterating over the elements of an array of a list. Make...
Read more >
3 Tips to solve and Avoid java.lang ... - Javarevisited
The error ArrayIndexOutOfBoundsException: 1 means index 1 is invalid and it's out of bounding i.e. more than the length of the array. Since...
Read more >
What is a StringIndexOutOfBoundsException? How can I fix it?
Analyzing StackTrace · Validating input string against nullity , length or valid indexes · Using Debugging or Logs · Using Generic Exception catch...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found