EpollEventLoop.wakeup takes much longer with new netty versions
See original GitHub issueWhile upgrading from netty 4.0.23 to 4.1.49 we experienced a much higher CPU consumption when sending data. We have 300-400 connections open and the server sends a few hundred small packets per second to every client.
After the upgrade, our main threads (that sends the packets using the method below) seem to ‘hang’ here: https://imgur.com/a/VMo9szT.
We send the packets like this:
Channel channel = /* ... */;
Packet packet = /* ... */;
EventLoop eventLoop = channel.eventLoop();
if (eventLoop.inEventLoop()) {
/* ... */
} else {
eventLoop.execute(() -> {
ChannelFuture future = channel.writeAndFlush(packet);
future.addListener(ChannelFutureListener.FIRE_EXCEPTION_ON_FAILURE);
});
}
We are using the default options except for TCP_NODELAY. I saw that @njhill did multiple changes to the wait/wakeup logic could that be the cause of this?
OS: Linux HC-1 4.19.0-8-amd64 #1 SMP Debian 4.19.98-1 (2020-01-26) x86_64
Java: openjdk version “1.8.0_242” OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_242-b08) OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.242-b08, mixed mode)
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (5 by maintainers)
Top GitHub Comments
@SplotyCode a few questions:
NioEventLoop
?One hunch is that it could be related to speed-ups on the event loop meaning it completes iterations faster and returns to wait prior to the next task coming in whereas before it may have stayed awake long enough to see it. You could experiment with adding a
Thread.yield()
after the write in the scheduled task (ignoring final bullet above)Not really, it just means many tasks are submitted from outside the EL each of which must have completed prior to the next being submitted. This is consistent with my hypothesis above. Experimenting with the
Thread.yield()
suggestion might give more clues. How many other threads are submitting these tasks? An alternative would be to limit the rate that flush is called, per number of writes or number of microsecs (and probably per source thread). Calling write without flush won’t wake up the EL.It’s up to you to decide latency/cpu tradeoff, the other extreme is to dedicate a core to the EL and use busy-wait.
BTW though it will reduce some overhead I don’t expect your change to call writeAndFlush directly will make too much difference to the effects that you’re observing since it still schedules a task on the EL itself.