question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

EpollEventLoop.wakeup takes much longer with new netty versions

See original GitHub issue

While upgrading from netty 4.0.23 to 4.1.49 we experienced a much higher CPU consumption when sending data. We have 300-400 connections open and the server sends a few hundred small packets per second to every client.

After the upgrade, our main threads (that sends the packets using the method below) seem to ‘hang’ here: https://imgur.com/a/VMo9szT.

We send the packets like this:

Channel channel = /* ... */;
Packet packet = /* ... */;
EventLoop eventLoop = channel.eventLoop();
if (eventLoop.inEventLoop()) {
  /* ... */
} else {
  eventLoop.execute(() -> {
    ChannelFuture future = channel.writeAndFlush(packet);
    future.addListener(ChannelFutureListener.FIRE_EXCEPTION_ON_FAILURE);
  });
}

We are using the default options except for TCP_NODELAY. I saw that @njhill did multiple changes to the wait/wakeup logic could that be the cause of this?

OS: Linux HC-1 4.19.0-8-amd64 #1 SMP Debian 4.19.98-1 (2020-01-26) x86_64

Java: openjdk version “1.8.0_242” OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_242-b08) OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.242-b08, mixed mode)

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
njhillcommented, May 23, 2020

@SplotyCode a few questions:

  • Do you know whether overall performance (latency etc) was better or worse despite the CPU increase?
  • How about the rate of wakeups relative to rate of messages?
  • Are you able to compare the performance with NioEventLoop?
  • In your example code above, why do you schedule the writeAndFlush on the event loop? It can be called from outside the EL

One hunch is that it could be related to speed-ups on the event loop meaning it completes iterations faster and returns to wait prior to the next task coming in whereas before it may have stayed awake long enough to see it. You could experiment with adding a Thread.yield() after the write in the scheduled task (ignoring final bullet above)

0reactions
njhillcommented, May 24, 2020

This is unexpected behavior right?

Not really, it just means many tasks are submitted from outside the EL each of which must have completed prior to the next being submitted. This is consistent with my hypothesis above. Experimenting with the Thread.yield() suggestion might give more clues. How many other threads are submitting these tasks? An alternative would be to limit the rate that flush is called, per number of writes or number of microsecs (and probably per source thread). Calling write without flush won’t wake up the EL.

It’s up to you to decide latency/cpu tradeoff, the other extreme is to dedicate a core to the EL and use busy-wait.

BTW though it will reduce some overhead I don’t expect your change to call writeAndFlush directly will make too much difference to the effects that you’re observing since it still schedules a task on the EL itself.

Read more comments on GitHub >

github_iconTop Results From Across the Web

EpollEventLoop xref - Netty
getInstance(EpollEventLoop.class); 50 private static final long ... AtomicLong nextWakeupNanos = new AtomicLong(AWAKE); 86 private boolean pendingWakeup; ...
Read more >
netty/netty - Gitter
Currently I encounter issues with SMTP pipelining: the client sends all the SMTP requests into a single network hop. Socket client = new...
Read more >
Java example source code file (EpollEventLoop.java)
This example Java source code file (EpollEventLoop.java) is included in the alvinalexander.com ... package io.netty.channel.epoll; import io.netty.channel.
Read more >
Is Java Still the Best Choice Now That Go Is Available?
Amid the emergence of many new asynchronous frameworks and languages, this article throws light on how WISP 2 brings the coroutine capability of...
Read more >
io.netty.channel.epoll.EpollEventLoop.hasTasks java code ...
It might be pended until idle timeout if IdleStateHandler existed // in pipeline. if (oldWakeup && hasTasks()) { return epollWaitNow(); } long totalDelay ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found