Netty Pipeline processes flush operation out of order
See original GitHub issueDuring an attempted upgrade of netty we ran into an issue starting in version 4.1.35 it appears that sometimes channel.flush() ends up getting processed before a channel.write() even when called after the write(). These actions happen on a thread outside the netty pipeline.
Calling channel.writeAndFlush() seems to fix the issue, but not sure what else might be impacted.
Expected behavior
on an thread external to the netty pipeline:
channel.write()
channel.flush()
works the same as
channel.writeAndFlush()
Actual behavior
channel.flush() after a few channel.write() operations may get processed by the pipeline before the writes, causing the connection to hang without flushing data.
Steps to reproduce
I assume the trigger here is writing to the channel from an outside thread.
Minimal yet complete reproducer code (or URL to code)
Unfortunately I can’t share the current code that causes it, I haven’t yet come up with a minimal demo.
Netty version
works in 4.1.34, broken in 4.1.35
JVM version (e.g. java -version
)
1.8.0_202
OS version (e.g. uname -a
)
macos Catalina 10.15.5 Darwin Kernel Version 19.5.0: Tue May 26 20:41:44 PDT 2020; root:xnu-6153.121.2~2/RELEASE_X86_64 x86_64
Issue Analytics
- State:
- Created 3 years ago
- Comments:15 (9 by maintainers)
Top GitHub Comments
Feels pretty expected, as all this does is invoke 2 schedules to the event loop, and each can end up on a different thread.
If you want to send many writes and a single flush, you should jump into the event loop first then send it all on the same thread.
Looking at the difference between
4.1.34.Final
and4.1.35.Final
where @jwils reported the issue first occurred, I notice inAbstractChannleHandlerContext
that the signature offindContextOutbound
has changed to include anint mask
parameter. The body of that method has changed as follows:In the
write
method we see that we get an executor from the channel handler context viaEventExecutor executor = next.executor();
.next
was previously was just whatever matchedctx.outbound
for bothflush
andwrite
, but is now potentially different for these:Because the executor may now be different, then sequential
write
andflush
invocations made in the same thread may be enqueued on differentexecutor
s.Though note that if the
flush
parameter of thewrite
method istrue
as is the case forwriteAndFlush
, then it will potentially (and possibly always, depending on the ordering of the contexts in the context chain) select the same context as in the purewrite
case given that the mask used in thewriteAndFlush
invocation isMASK_WRITE | MASK_FLUSH
.This may help explain why a
write
might be executed after aflush
even within the same thread, but that awriteAndFlush
mitigates the issue.https://github.com/netty/netty/compare/netty-4.1.34.Final...netty-4.1.35.Final