Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

EpollEventLoop.wakeup takes much longer with new netty versions

See original GitHub issue

While upgrading from netty 4.0.23 to 4.1.49 we experienced a much higher CPU consumption when sending data. We have 300-400 connections open and the server sends a few hundred small packets per second to every client.

After the upgrade, our main threads (that sends the packets using the method below) seem to ‘hang’ here: https://imgur.com/a/VMo9szT.

We send the packets like this:

Channel channel = /* ... */;
Packet packet = /* ... */;
EventLoop eventLoop = channel.eventLoop();
if (eventLoop.inEventLoop()) {
  /* ... */
} else {
  eventLoop.execute(() -> {
    ChannelFuture future = channel.writeAndFlush(packet);
    future.addListener(ChannelFutureListener.FIRE_EXCEPTION_ON_FAILURE);
  });
}

We are using the default options except for TCP_NODELAY. I saw that @njhill did multiple changes to the wait/wakeup logic could that be the cause of this?

OS: Linux HC-1 4.19.0-8-amd64 #1 SMP Debian 4.19.98-1 (2020-01-26) x86_64

Java: openjdk version “1.8.0_242” OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_242-b08) OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.242-b08, mixed mode)

Issue Analytics

State:
Created 3 years ago
Comments:5 (5 by maintainers)

Top GitHub Comments

1reaction

njhillcommented, May 23, 2020

@SplotyCode a few questions:

Do you know whether overall performance (latency etc) was better or worse despite the CPU increase?
How about the rate of wakeups relative to rate of messages?
Are you able to compare the performance with NioEventLoop?
In your example code above, why do you schedule the writeAndFlush on the event loop? It can be called from outside the EL

One hunch is that it could be related to speed-ups on the event loop meaning it completes iterations faster and returns to wait prior to the next task coming in whereas before it may have stayed awake long enough to see it. You could experiment with adding a Thread.yield() after the write in the scheduled task (ignoring final bullet above)

0reactions

njhillcommented, May 24, 2020

This is unexpected behavior right?

Not really, it just means many tasks are submitted from outside the EL each of which must have completed prior to the next being submitted. This is consistent with my hypothesis above. Experimenting with the Thread.yield() suggestion might give more clues. How many other threads are submitting these tasks? An alternative would be to limit the rate that flush is called, per number of writes or number of microsecs (and probably per source thread). Calling write without flush won’t wake up the EL.

It’s up to you to decide latency/cpu tradeoff, the other extreme is to dedicate a core to the EL and use busy-wait.

BTW though it will reduce some overhead I don’t expect your change to call writeAndFlush directly will make too much difference to the effects that you’re observing since it still schedules a task on the EL itself.

Top Results From Across the Web

EpollEventLoop xref - Netty

getInstance(EpollEventLoop.class); 50 private static final long ... AtomicLong nextWakeupNanos = new AtomicLong(AWAKE); 86 private boolean pendingWakeup; ...

netty/netty - Gitter

Currently I encounter issues with SMTP pipelining: the client sends all the SMTP requests into a single network hop. Socket client = new...

Java example source code file (EpollEventLoop.java)

This example Java source code file (EpollEventLoop.java) is included in the alvinalexander.com ... package io.netty.channel.epoll; import io.netty.channel.

Is Java Still the Best Choice Now That Go Is Available?

Amid the emergence of many new asynchronous frameworks and languages, this article throws light on how WISP 2 brings the coroutine capability of...

io.netty.channel.epoll.EpollEventLoop.hasTasks java code ...

It might be pended until idle timeout if IdleStateHandler existed // in pipeline. if (oldWakeup && hasTasks()) { return epollWaitNow(); } long totalDelay ......