question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

epoll_wait produces an EINVAL error since 4.1.30

See original GitHub issue

Expected behavior

epoll_wait should work in 4.1.30 like it did in 4.1.29

Actual behavior

Since switching to 4.1.30, EpollEventLoop’s handleLoopException is triggered with io.netty.channel.ChannelException: timerfd_settime() failed: Invalid argument, which points to timerfd_settime. This causes an epoll thread to be “blocked” sleeping.

The issue is visible in a Spring Boot test dealing with bad SSL certificates, which uses reactor/reactor-netty.

While investigating this remotely with limited resources (partial access to the logs and reproduction case, no local linux machine to test on), I found that the 4.1.30 suspiciously contained an issue related to epoll_wait.

Looking at the PR I think I might have found the regression:

https://github.com/netty/netty/pull/7816/files#diff-db3e069239a403b954e3ebc024ba9507R251

Integer.MAX_VALUE should be MAX_SCHEDULED_TIMERFD_NS (999,999,999) like it was before the PR, else timerfd_settime might return EINVAL if it is too large.

Steps to reproduce

The issue is triggered during tests of Spring Boot, but this is a smaller reproduction snippet that is using Spring Framework 5:

@Test
    public void strippedDown() {
        assertThatExceptionOfType(RuntimeException.class)
                .isThrownBy(() -> WebClient.create().get()
                        .uri("https://" + "self-signed.badssl.com/").exchange()
                        .block(Duration.ofSeconds(10)))
                .withCauseInstanceOf(SSLException.class);
    }

I can try to spin up a repository with a maven project that reproduces the issue and can be run without set up if you need.

Netty version

4.1.30

JVM version (e.g. java -version)

??

OS version (e.g. uname -a)

??

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:20 (17 by maintainers)

github_iconTop GitHub Comments

1reaction
wilkinsonacommented, Oct 11, 2018

I can fill in some of the blanks about the environment where we’ve seen the issue:

JVM version (e.g. java -version)

openjdk version "1.8.0_181"
OpenJDK Runtime Environment (build 1.8.0_181-8u181-b13-1~deb9u1-b13)
OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode)

OS version (e.g. name -a)

Linux e4fff31d-4e97-44d3-5088-942369c43954 4.4.0-134-generic #160~14.04.1-Ubuntu SMP Fri Aug 17 11:07:07 UTC 2018 x86_64 GNU/Linux
0reactions
normanmaurercommented, Oct 29, 2018

@wilkinsona I would open a new one as the error itself is gone (I was also not able to reproduce yet 😦 ).

Read more comments on GitHub >

github_iconTop Results From Across the Web

Error Codes (The GNU C Library)
The error code macros are defined in the header file errno.h . All of them expand into integer constant values. Some of these...
Read more >
c++ - Machine dependent _write failures with EINVAL error code
Is anyone aware of a limit on the amount of data _write can handle in a single call? Or - barring _write -...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found