question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Netty Pipeline processes flush operation out of order

See original GitHub issue

During an attempted upgrade of netty we ran into an issue starting in version 4.1.35 it appears that sometimes channel.flush() ends up getting processed before a channel.write() even when called after the write(). These actions happen on a thread outside the netty pipeline.

Calling channel.writeAndFlush() seems to fix the issue, but not sure what else might be impacted.

Expected behavior

on an thread external to the netty pipeline:

channel.write()
channel.flush()

works the same as

channel.writeAndFlush()

Actual behavior

channel.flush() after a few channel.write() operations may get processed by the pipeline before the writes, causing the connection to hang without flushing data.

Steps to reproduce

I assume the trigger here is writing to the channel from an outside thread.

Minimal yet complete reproducer code (or URL to code)

Unfortunately I can’t share the current code that causes it, I haven’t yet come up with a minimal demo.

Netty version

works in 4.1.34, broken in 4.1.35

JVM version (e.g. java -version)

1.8.0_202

OS version (e.g. uname -a)

macos Catalina 10.15.5 Darwin Kernel Version 19.5.0: Tue May 26 20:41:44 PDT 2020; root:xnu-6153.121.2~2/RELEASE_X86_64 x86_64

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:15 (9 by maintainers)

github_iconTop GitHub Comments

4reactions
aikarcommented, Jun 20, 2020

Feels pretty expected, as all this does is invoke 2 schedules to the event loop, and each can end up on a different thread.

If you want to send many writes and a single flush, you should jump into the event loop first then send it all on the same thread.

0reactions
efialacommented, Apr 29, 2021

Looking at the difference between 4.1.34.Final and 4.1.35.Final where @jwils reported the issue first occurred, I notice in AbstractChannleHandlerContext that the signature of findContextOutbound has changed to include an int mask parameter. The body of that method has changed as follows:

- private AbstractChannelHandlerContext findContextOutbound() {
+ private AbstractChannelHandlerContext findContextOutbound(int mask) {
      AbstractChannelHandlerContext ctx = this;
      do {
          ctx = ctx.prev;
-     } while (!ctx.outbound);
+     } while ((ctx.executionMask & mask) == 0);
  }

In the write method we see that we get an executor from the channel handler context via EventExecutor executor = next.executor();. next was previously was just whatever matched ctx.outbound for both flush and write, but is now potentially different for these:

    private void write(Object msg, boolean flush, ChannelPromise promise) {
        ObjectUtil.checkNotNull(msg, "msg");
        try {
            if (isNotValidPromise(promise, true)) {
                ReferenceCountUtil.release(msg);
                // cancelled
                return;
            }
        } catch (RuntimeException e) {
            ReferenceCountUtil.release(msg);
            throw e;
        }

-       AbstractChannelHandlerContext next = findContextOutbound();
+       final AbstractChannelHandlerContext next = findContextOutbound(flush ?
+               (MASK_WRITE | MASK_FLUSH) : MASK_WRITE);
        final Object m = pipeline.touch(msg, next);
        EventExecutor executor = next.executor();
        if (executor.inEventLoop()) {
            if (flush) {
                next.invokeWriteAndFlush(m, promise);
            } else {
                next.invokeWrite(m, promise);
            }
        } else {
            final AbstractWriteTask task;
            if (flush) {
                task = WriteAndFlushTask.newInstance(next, m, promise);
            }  else {
                task = WriteTask.newInstance(next, m, promise);
            }
            if (!safeExecute(executor, task, promise, m)) {
                // We failed to submit the AbstractWriteTask. We need to cancel it so we decrement the pending bytes
                // and put it back in the Recycler for re-use later.
                //
                // See https://github.com/netty/netty/issues/8343.
                task.cancel();
            }
        }
    }

Because the executor may now be different, then sequential write and flush invocations made in the same thread may be enqueued on different executors.

Though note that if the flush parameter of the write method is true as is the case for writeAndFlush, then it will potentially (and possibly always, depending on the ordering of the contexts in the context chain) select the same context as in the pure write case given that the mask used in the writeAndFlush invocation is MASK_WRITE | MASK_FLUSH.

This may help explain why a write might be executed after a flush even within the same thread, but that a writeAndFlush mitigates the issue.

https://github.com/netty/netty/compare/netty-4.1.34.Final...netty-4.1.35.Final

Read more comments on GitHub >

github_iconTop Results From Across the Web

ChannelPipeline (Netty API Reference (4.0.56.Final))
A user is supposed to have one or more ChannelHandler s in a pipeline to receive I/O events (e.g. read) and to request...
Read more >
Netty Pipeline being executed out of order - Stack Overflow
I am unsure why it is being executed so out of order. ... ChannelPipeline; import io.netty.handler.codec. ... writeAndFlush(msg).
Read more >
Chapter 6. ChannelHandler and ChannelPipeline - Netty in ...
Outbound operations and data are processed by ChannelOutboundHandler . Its methods are invoked by Channel , ChannelPipeline , and ChannelHandlerContext . C ...
Read more >
A Tour of Netty. Introduction | by Kondah Mouad | Geek Culture
Channel is a bidirectional flow where I/O operations (read and write) are performed in an asynchronous fashion. Data arriving to each channel is ......
Read more >
NettyOutbound (reactor-netty 1.1.1-SNAPSHOT)
Sends an object through Netty pipeline. If type of Publisher , sends all signals, flushing on complete by default. Write occur in FIFO...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found