Invalid Argument Exception in WeightedFairQueueByteDistributor
See original GitHub issueI tracked down a bug which causes clients to fail several requests when visiting our Netty HTTP/2 server from Chome. Typically the Chrome Dev Tools page shows some requests failed due to the connection being closed early. It seemingly only happened when coming from the browser. It additionally only seems to happen for very new connections, longer lived connections don exhibit the problem. Lastly, this bug appears to be pretty sensitive to timing. Slowing the rate of requests seems to make this bug disappear.
The behavior of the bug is:
- Client sends some streams to the server, and then adjusts the priorities and dependencies.
- The server uses the WeightedFairQueueByteDistributor rather than the UniformStreamByteDistributor
- The exception (shown below) is thrown, causing the Http2 handler to catch it and send a Go away.
- After the go away, the underlying connection is closed.
io.netty.handler.codec.http2.Http2Exception: Error flushing
at io.netty.handler.codec.http2.Http2Exception.connectionError(Http2Exception.java:117)
at io.netty.handler.codec.http2.Http2ConnectionHandler.flush(Http2ConnectionHandler.java:193)
at io.netty.handler.codec.http2.Http2ConnectionHandler.channelWritabilityChanged(Http2ConnectionHandler.java:428)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelWritabilityChanged(AbstractChannelHandlerContext.java:441)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelWritabilityChanged(AbstractChannelHandlerContext.java:428)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelWritabilityChanged(AbstractChannelHandlerContext.java:421)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelWritabilityChanged(DefaultChannelPipeline.java:1433)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelWritabilityChanged(AbstractChannelHandlerContext.java:441)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelWritabilityChanged(AbstractChannelHandlerContext.java:428)
at io.netty.channel.DefaultChannelPipeline.fireChannelWritabilityChanged(DefaultChannelPipeline.java:931)
at io.netty.channel.ChannelOutboundBuffer.fireChannelWritabilityChanged(ChannelOutboundBuffer.java:628)
at io.netty.channel.ChannelOutboundBuffer.setWritable(ChannelOutboundBuffer.java:594)
at io.netty.channel.ChannelOutboundBuffer.decrementPendingOutboundBytes(ChannelOutboundBuffer.java:196)
at io.netty.channel.ChannelOutboundBuffer.remove(ChannelOutboundBuffer.java:273)
at io.netty.channel.ChannelOutboundBuffer.removeBytes(ChannelOutboundBuffer.java:352)
at io.netty.channel.epoll.AbstractEpollStreamChannel.writeBytesMultiple(AbstractEpollStreamChannel.java:305)
at io.netty.channel.epoll.AbstractEpollStreamChannel.doWriteMultiple(AbstractEpollStreamChannel.java:510)
at io.netty.channel.epoll.AbstractEpollStreamChannel.doWrite(AbstractEpollStreamChannel.java:422)
at io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:930)
at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:532)
at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:465)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.IllegalArgumentException: e.priorityQueueIndex(): 0 (expected: -1) + e: {streamId 7 streamableBytes 0 activeCountForTree 6 pseudoTimeQueueIndex 0 pseudoTimeToWrite 119223 pseudoTime 0 flags 4 pseudoTimeQueue.size() 1 stateOnlyQueueIndex 0 parent.streamId 13} [{streamId 15 streamableBytes 185716 activeCountForTree 6 pseudoTimeQueueIndex 0 pseudoTimeToWrite 0 pseudoTime 0 flags 5 pseudoTimeQueue.size() 1 stateOnlyQueueIndex -1 parent.streamId 7} [{streamId 17 streamableBytes 98524 activeCountForTree 5 pseudoTimeQueueIndex 0 pseudoTimeToWrite 0 pseudoTime 0 flags 5 pseudoTimeQueue.size() 1 stateOnlyQueueIndex -1 parent.streamId 15} [{streamId 19 streamableBytes 167141 activeCountForTree 4 pseudoTimeQueueIndex 0 pseudoTimeToWrite 0 pseudoTime 0 flags 5 pseudoTimeQueue.size() 1 stateOnlyQueueIndex -1 parent.streamId 17} [{streamId 21 streamableBytes 1445060 activeCountForTree 3 pseudoTimeQueueIndex 0 pseudoTimeToWrite 0 pseudoTime 0 flags 5 pseudoTimeQueue.size() 1 stateOnlyQueueIndex -1 parent.streamId 19} [{streamId 23 streamableBytes 2920 activeCountForTree 2 pseudoTimeQueueIndex 0 pseudoTimeToWrite 0 pseudoTime 0 flags 5 pseudoTimeQueue.size() 1 stateOnlyQueueIndex -1 parent.streamId 21} [{streamId 25 streamableBytes 10 activeCountForTree 1 pseudoTimeQueueIndex 0 pseudoTimeToWrite 0 pseudoTime 0 flags 5 pseudoTimeQueue.size() 0 stateOnlyQueueIndex -1 parent.streamId 23} []]]]]]]
at io.netty.util.internal.DefaultPriorityQueue.offer(DefaultPriorityQueue.java:88)
at io.netty.util.internal.DefaultPriorityQueue.offer(DefaultPriorityQueue.java:31)
at io.netty.handler.codec.http2.WeightedFairQueueByteDistributor$State.offerPseudoTimeQueue(WeightedFairQueueByteDistributor.java:671)
at io.netty.handler.codec.http2.WeightedFairQueueByteDistributor.distributeToChildren(WeightedFairQueueByteDistributor.java:340)
at io.netty.handler.codec.http2.WeightedFairQueueByteDistributor.distribute(WeightedFairQueueByteDistributor.java:303)
at io.netty.handler.codec.http2.WeightedFairQueueByteDistributor.distributeToChildren(WeightedFairQueueByteDistributor.java:325)
at io.netty.handler.codec.http2.WeightedFairQueueByteDistributor.distribute(WeightedFairQueueByteDistributor.java:303)
at io.netty.handler.codec.http2.WeightedFairQueueByteDistributor.distributeToChildren(WeightedFairQueueByteDistributor.java:325)
at io.netty.handler.codec.http2.WeightedFairQueueByteDistributor.distribute(WeightedFairQueueByteDistributor.java:303)
at io.netty.handler.codec.http2.WeightedFairQueueByteDistributor.distributeToChildren(WeightedFairQueueByteDistributor.java:325)
at io.netty.handler.codec.http2.WeightedFairQueueByteDistributor.distribute(WeightedFairQueueByteDistributor.java:303)
at io.netty.handler.codec.http2.WeightedFairQueueByteDistributor.distributeToChildren(WeightedFairQueueByteDistributor.java:325)
at io.netty.handler.codec.http2.WeightedFairQueueByteDistributor.distribute(WeightedFairQueueByteDistributor.java:303)
at io.netty.handler.codec.http2.WeightedFairQueueByteDistributor.distributeToChildren(WeightedFairQueueByteDistributor.java:325)
at io.netty.handler.codec.http2.WeightedFairQueueByteDistributor.distribute(WeightedFairQueueByteDistributor.java:273)
at io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$WritabilityMonitor.writePendingBytes(DefaultHttp2RemoteFlowController.java:627)
at io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController.writePendingBytes(DefaultHttp2RemoteFlowController.java:267)
at io.netty.handler.codec.http2.Http2ConnectionHandler.flush(Http2ConnectionHandler.java:188)
... 24 more
Steps to reproduce
I don’t have an exact way to reproduce it yet, but I can get it to happen about half the time after killing the connections in Chrome. I do have the HTTP/2 state logged by Chrome for a failure though. I can provide it if it would help.
Minimal yet complete reproducer code (or URL to code)
Netty version
4.1.51
JVM version (e.g. java -version
)
JDK 11
Issue Analytics
- State:
- Created 3 years ago
- Comments:9 (8 by maintainers)
Top GitHub Comments
For workaround with
4.1.51
one canThat
IllegalArgumentException
is quite cryptic, but looking at the code it should have said: “Cannot add an element that is already in the queue.” I’ll note that it looks like there is a very deep hierarchy of streams, such that it is effectively a linked list. I think the browser is providing a very strict dependency order.There appears to be a clearly matched poll for the childState that was being offered. Something is clearly re-adding it while distributing. I don’t see any
!isDistributing()
check whennotifyParentChanged()
callsofferAndInitializePseudoTime()
(I see that inactiveCountChangeForTree()
, for example).I think maybe a
write()
during distribute() is completing a stream and soonStreamRemoved()
is called, which eventually offers some stream that is currently being distributed. So the fix may be as simple as surroundingnotifyParentChanged()
'sofferAndInitializePseudoTime()
with a!isDistributing()
condition. It’s unclear, but I think theactiveCountChangeForTree()
is appropriate even with distributing, so it wouldn’t be within the condition.The code has changed dramatically and gotten much more complex since I reviewed it, such that it is fairly unrecognizable. This investigation was just based on reading the code. Based on the motivation of https://github.com/netty/netty/commit/c4e96d010e3d16810d7130c93169817b3d72b421, I assume the problem is when a child is writable before its parent, but I think it would take me quite some time to figure out precisely what is happening in order to make a test.
@carl-mastrangelo, could you try surrounding
notifyParentChanged()
'sofferAndInitializePseudoTime()
with a!isDistributing()
condition? If that fixes the problem, then restore the old code but print a stack trace whenisDistributing() == true
as well as theevent.state
; that along with the IllegalStateException dump might be enough to determining exactly what case is being triggered.