Latency increased over time due to Weak Processing?
See original GitHub issueExpected behavior
I’m doing a performance test with TCP client-server model.
The client sends requests with a fixed payload size (2kb) to the server at a fixed rate (20k ops/s) and the concurrency level is less or equal to 10. The client is configured to use ByteToMessageEncoder/Decoder, with PooledByteBufAllocator and direct memory. The client-side code looks something like this:
byte[] data = new byte[2048];
ByteBuf buffer = PooledByteBufAllocator.DEFAULT.buffer();
buffer.writeBytes(data);
channel.writeAndFLush(buffer);
The server reads the data from the request into a bytebuf (also pooled) and replies immediately with the same payload from the request. The server-side final handler code looks something like this:
ByteBuf data = request.getContent();
ByteBuf buffer = PooledByteBufAllocator.DEFAULT.buffer();
data.readByte(buffer );
channel.writeAndFLush(buffer);
For the first few hours, the average latency (measured every 5s on the client-side) was stable, it fluctuated in the same range. I’ve verified that there is no buffer leak on both client-side and server-side (unit tested with io.netty.ResourceLeakDetector at paranoid level). As the rate of sending request is limited, the latency is expected to remain the same over time.
Actual behavior
I recognized the average latency had increased slowly. Both the maximum and minimum latency had increased. I’ve checked the GC logs at debug level on the client side.The Weak Processing time in the Post Evacuation Collection Set phase increased slowly, which caused the Pause Young time to increase. The number of old regions had increased too.
The gc log of Weak Processing time looks like this:
GC(20): Weak Processing 0.0ms
GC(21): Weak Processing 0.1ms
GC(22): Weak Processing 0.0ms
GC(23): Weak Processing 0.1ms
GC(24): Weak Processing 0.1ms
GC(25): Weak Processing 0.1ms
...
GC(103): Weak Processing 0.2ms
GC(104): Weak Processing 0.2ms
GC(105): Weak Processing 0.2ms
GC(106): Weak Processing 0.3ms
...
GC(1000): Weak Processing 1.3ms
GC(1001): Weak Processing 1.2ms
GC(1002): Weak Processing 1.2ms
GC(1003): Weak Processing 1.2ms
The amount of Post Evacuation Collection Set time and Pause Young increased the same amount :
GC(20): Post Evacuation Collection Set 3.0ms
GC(21): Post Evacuation Collection Set 3.1ms
GC(22): Post Evacuation Collection Set 3.2ms
GC(23): Post Evacuation Collection Set 3.1ms
...
GC(103): Post Evacuation Collection Set 3.2ms
GC(104): Post Evacuation Collection Set 3.3ms
...
GC(1000): Post Evacuation Collection Set 4.3ms
GC(1001): Post Evacuation Collection Set 4.2ms
The heap size remained the same as at the start of the test. A mixed or full GC would be triggered as the old regions increase, and the latency would increase unpredictably.
I’ve tried to increase the heap size or decrease the allocation rate. But the Weak Processing time kept increasing, just slowlier.
Questions
What is Weak Processing and why the Weak Processing time increased? Is it the root cause to the increase of latency? How can i fix this so the average latency can remain stable?
Netty version
4.1.51 Final
JVM version (e.g. java -version
)
Oracle JDK 11.0.8
OS version (e.g. uname -a
)
Centos 7
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (4 by maintainers)
Top GitHub Comments
There are some weak references involved in the pooling allocator implementation. You can try disabling what’s called the recycler with
-Dio.netty.recycler.maxCapacity=0
, or you can try not using a pooling allocator. I also recommend looking into upgrading your Netty version.Let me close this as there was no more data provided