High latency in PoolArena.allocateNormal()
See original GitHub issueExpected behavior
We expect netty to take less than 20ms to finish PoolArena.allocateNormal() call.
Actual behavior
We are observing ~100ms wall time (~5ms CPU time) latency spike in PooArena.allocateNormal() calls. From debugging we see that most time is spent in allocating newChunk . The following is the time it took to allocate newChunk() and the stack trace at that point. I added our own logging to get this:
2020-03-19T03:43:43.453-0700 INFO drift-client-135 stdout DirectArena.allocateDirect.ByteBuffer time: 141ms cpuTime: 2ms
2020-03-19T03:43:43.453-0700 INFO drift-client-135 stdout reqCapacity: 8388608 PoolArena.allocateNormal.newChunk time: 141ms cpuTime: 2ms
2020-03-19T03:43:43.455-0700 INFO drift-client-135 stdout PoolArena.allocateNormal.newChunk stack trace:
at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:289)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:246)
at io.netty.buffer.PoolArena.reallocate(PoolArena.java:469)
at io.netty.buffer.PooledByteBuf.capacity(PooledByteBuf.java:118)
at io.netty.buffer.AbstractByteBuf.ensureWritable0(AbstractByteBuf.java:306)
at io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:282)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1104)
at io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:99)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:274)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:355)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:377)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at java.lang.Thread.run(Thread.java:844)
Our Setup: We have a client using Thrift-netty stack to communicate with a server. The server handles the request and returns the response(Avg size ~500KB) to client. The client and server processes are on the same machine. Our client will be sending thousands of requests per sec to the server. And it opens a new connection for every request.
- Looks like we are doing a lot of 8MB and 12MB chunk allocations and these are costly.
- Another question is why cant these allocation be done from cache or released to cache ? Is it because of max 32KB cache limitation ?
- How / What netty config parameters to use to avoid these huge number of allocations.
- Can you help us in tuning netty to our needs ? Currently we do not set any netty config parameters and they have default values.
Netty version
4.1.46
JVM version (e.g. java -version
)
java version “10” 2018-03-20 Java™ SE Runtime Environment 18.3 (build 10+46) Java HotSpot™ 64-Bit Server VM 18.3 (build 10+46, mixed mode)
OS version (e.g. uname -a
)
Linux Tue Oct 29 07:36:32 PDT 2019 x86_64 x86_64 x86_64 GNU/Linux
Issue Analytics
- State:
- Created 4 years ago
- Comments:10 (5 by maintainers)
Top GitHub Comments
@NikhilCollooru So my suggestion is to try with these 3 events and possibly (unless you have 1 billion of threads) using
-t
@franz1981 thanks for quick reply !. Before I proceed, what are we trying to find through this profiling ? I have used async profiler before. So we are interested in CPU usage profile or Alloc profile ?