question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Epoll consuming lot more CPU than Nio

See original GitHub issue

I am building a websocket based broker. During load testing, we found that Epoll transport uses around 55% CPU compared to around 20% for Nio, just for maintaining the connections without doing any business specific IO on those connections. Is this expected? What could I be doing wrong? Happy to share any more info required around this.

Total concurrent connections: Around 27K Boss Threads: 1 Worker Threads: 32

Cores in the VM: 8

Relevant code that sets up netty.

  EventExecutorChooserFactory chooserFactory = DefaultEventExecutorChooserFactory.INSTANCE;

  ThreadFactory bossThreadFactory = ThreadFactoryUtil.createInstrumented("boss", metricRegistry);

  Executor workerExecutor = new ThreadPerTaskExecutor(
          ThreadFactoryUtil.createAffinityThreadFactory("worker.thread", metricRegistry));

private void setupNioEventLoopGroups(EventExecutorChooserFactory chooserFactory, ThreadFactory bossThreadFactory,
            Executor workerExecutor) {
        bossPool = new NioEventLoopGroup(acceptorThreads, bossThreadFactory);
        workerPool = new NioEventLoopGroup(workerThreads, workerExecutor, chooserFactory, SelectorProvider.provider(),
                DefaultSelectStrategyFactory.INSTANCE);
        channelType = NioServerSocketChannel.class;
        log.info("Initialiazing Java NIO Event System");
    }

    private void setupEpollEventLoopGroups(EventExecutorChooserFactory chooserFactory, ThreadFactory bossThreadFactory,
            Executor workerExecutor) {
        bossPool = new EpollEventLoopGroup(acceptorThreads, bossThreadFactory);
        workerPool = new EpollEventLoopGroup(workerThreads, workerExecutor, chooserFactory,
                DefaultSelectStrategyFactory.INSTANCE);
        channelType = EpollServerSocketChannel.class;
        log.info("Initialiazing Epoll IO Event System");
    }


        ServerBootstrap serverBootstrap = new ServerBootstrap().group(bossPool, workerPool);

        // Choose socket options.
        Map<ChannelOption<?>, Object> channelOptions = new HashMap<>();
        channelOptions.put(ChannelOption.SO_BACKLOG, 256);
        channelOptions.put(ChannelOption.ALLOCATOR, new PooledByteBufAllocator(true));
        channelOptions.put(ChannelOption.SO_TIMEOUT, 3000);

        channelOptions.forEach(
                (key, value) -> serverBootstrap.option(ChannelOption.valueOf(String.valueOf(key)), value));

        // Set transport options
        serverBootstrap.childOption(ChannelOption.TCP_NODELAY, true);
        serverBootstrap.childOption(ChannelOption.SO_KEEPALIVE, true);
        serverBootstrap.childOption(ChannelOption.SO_LINGER, -1);
        serverBootstrap.childOption(ChannelOption.SO_REUSEADDR, true);
        serverBootstrap.childOption(ChannelOption.ALLOCATOR, PooledByteBufAllocator.DEFAULT);
        serverBootstrap.childOption(ChannelOption.ALLOW_HALF_CLOSURE, false);
        serverBootstrap.childOption(ChannelOption.SO_SNDBUF, 10 * 1024);
        serverBootstrap.channel(serverGroup.getChannelType());
        serverBootstrap.childHandler(channelInitializer);

Netty version

4.1.68

JVM version (e.g. java -version)

openjdk version “11.0.12” 2021-07-20 LTS OpenJDK Runtime Environment Zulu11.50+19-CA (build 11.0.12+7-LTS) OpenJDK 64-Bit Server VM Zulu11.50+19-CA (build 11.0.12+7-LTS, mixed mode)

OS version (e.g. uname -a)

Linux bolt-004 5.8.0-1041-azure #44~20.04.1-Ubuntu SMP Fri Aug 20 20:41:09 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Epoll Screenshot 2021-09-20 at 6 36 30 PM

Nio Screenshot 2021-09-20 at 6 22 49 PM

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:30 (20 by maintainers)

github_iconTop GitHub Comments

2reactions
chrisvestcommented, Oct 14, 2021

Only using epoll_wait when the timeout is greater than a millisecond will cause rounding in timeouts, though. For 200 millisecond timeouts that probably won’t matter, but it might for a 1,5 millisecond timeout. So we’d need to pick a cut-off point. On kernel 5.11 and newer, we can use epoll_pwait2, which takes a timespec parameter.

2reactions
normanmaurercommented, Oct 13, 2021

I wonder if the time used by timerfd_settime is any indicator that we might be better of to not use it when possible (when the timeout is milliseconds) and just use epoll_wait(...) with the right timeout. WDYT ?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Why is epoll.wait using so much CPU time? : r/admincraft
My theory has always been that the chief factor regarding cpu usage is entities, yet I don't see an entity class taking up...
Read more >
Norman Maurer on Twitter: "Much ❤️ for people who include ...
Epoll consuming lot more CPU than Nio · Issue #11695 · netty/netty ... we found that Epoll transport uses around 55% CPU compared...
Read more >
cpu overload with NIO - are sources available? — oracle-tech
I am experiencing a CPU issue with NIO, basically the CPU goes 100% in the following method:
Read more >
java - High CPU utilization for threads that seem to be waiting
A thread in the runnable state is executing in the Java virtual machine but it may be waiting for other resources from the...
Read more >
Using select(2) the right way - Hacker News
So I would guess that while epoll requires more syscalls for setup it should be a lot more efficient if a FD is...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found