question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Latency increased over time due to Weak Processing?

See original GitHub issue

Expected behavior

I’m doing a performance test with TCP client-server model.

The client sends requests with a fixed payload size (2kb) to the server at a fixed rate (20k ops/s) and the concurrency level is less or equal to 10. The client is configured to use ByteToMessageEncoder/Decoder, with PooledByteBufAllocator and direct memory. The client-side code looks something like this:

byte[] data = new byte[2048];  
ByteBuf buffer = PooledByteBufAllocator.DEFAULT.buffer();   
buffer.writeBytes(data);    
channel.writeAndFLush(buffer);

The server reads the data from the request into a bytebuf (also pooled) and replies immediately with the same payload from the request. The server-side final handler code looks something like this:

ByteBuf data = request.getContent();  
ByteBuf buffer = PooledByteBufAllocator.DEFAULT.buffer();   
data.readByte(buffer );    
channel.writeAndFLush(buffer);

For the first few hours, the average latency (measured every 5s on the client-side) was stable, it fluctuated in the same range. I’ve verified that there is no buffer leak on both client-side and server-side (unit tested with io.netty.ResourceLeakDetector at paranoid level). As the rate of sending request is limited, the latency is expected to remain the same over time.

Actual behavior

I recognized the average latency had increased slowly. Both the maximum and minimum latency had increased. I’ve checked the GC logs at debug level on the client side.The Weak Processing time in the Post Evacuation Collection Set phase increased slowly, which caused the Pause Young time to increase. The number of old regions had increased too.

The gc log of Weak Processing time looks like this:

GC(20): Weak Processing 0.0ms
GC(21): Weak Processing 0.1ms
GC(22): Weak Processing 0.0ms
GC(23): Weak Processing 0.1ms
GC(24): Weak Processing 0.1ms
GC(25): Weak Processing 0.1ms
...
GC(103): Weak Processing 0.2ms
GC(104): Weak Processing 0.2ms
GC(105): Weak Processing 0.2ms
GC(106): Weak Processing 0.3ms
...
GC(1000): Weak Processing 1.3ms
GC(1001): Weak Processing 1.2ms
GC(1002): Weak Processing 1.2ms
GC(1003): Weak Processing 1.2ms

The amount of Post Evacuation Collection Set time and Pause Young increased the same amount :

GC(20): Post Evacuation Collection Set 3.0ms
GC(21): Post Evacuation Collection Set 3.1ms
GC(22): Post Evacuation Collection Set 3.2ms
GC(23): Post Evacuation Collection Set 3.1ms
...
GC(103): Post Evacuation Collection Set 3.2ms
GC(104): Post Evacuation Collection Set 3.3ms
...
GC(1000): Post Evacuation Collection Set 4.3ms
GC(1001): Post Evacuation Collection Set 4.2ms

The heap size remained the same as at the start of the test. A mixed or full GC would be triggered as the old regions increase, and the latency would increase unpredictably.

I’ve tried to increase the heap size or decrease the allocation rate. But the Weak Processing time kept increasing, just slowlier.

Questions

What is Weak Processing and why the Weak Processing time increased? Is it the root cause to the increase of latency? How can i fix this so the average latency can remain stable?

Netty version

4.1.51 Final

JVM version (e.g. java -version)

Oracle JDK 11.0.8

OS version (e.g. uname -a)

Centos 7

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
chrisvestcommented, Oct 19, 2021

There are some weak references involved in the pooling allocator implementation. You can try disabling what’s called the recycler with -Dio.netty.recycler.maxCapacity=0, or you can try not using a pooling allocator. I also recommend looking into upgrading your Netty version.

0reactions
normanmaurercommented, Nov 8, 2021

Let me close this as there was no more data provided

Read more comments on GitHub >

github_iconTop Results From Across the Web

Network Latency - Common Causes and Best Solutions | IR
Reasons behind VoIP latency and how to address them:​​ Insufficient bandwidth – with a slow internet connection, insufficient bandwidth means that data packets ......
Read more >
What is Latency? - TechTarget
Computer and OS latency is the combined delay between an input or command and the desired output. Contributors to increased computer latency include ......
Read more >
Investigating poor response times - IBM
The transmission of large amounts of data causes increased response times due to network latency over the connection between CICS Transaction Gateway and...
Read more >
What is latency? | How to fix latency - Cloudflare
High latency has a negative effect on user experience. Learn how to fix latency, and learn the differences between latency, bandwidth, and network ......
Read more >
What Is Network Latency? Typical Causes & Ways to Reduce It
In other words, latency meaning in networking refers to the time it takes for the request sent from the browser to be processed...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found