question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Invalid Argument Exception in WeightedFairQueueByteDistributor

See original GitHub issue

I tracked down a bug which causes clients to fail several requests when visiting our Netty HTTP/2 server from Chome. Typically the Chrome Dev Tools page shows some requests failed due to the connection being closed early. It seemingly only happened when coming from the browser. It additionally only seems to happen for very new connections, longer lived connections don exhibit the problem. Lastly, this bug appears to be pretty sensitive to timing. Slowing the rate of requests seems to make this bug disappear.

The behavior of the bug is:

  1. Client sends some streams to the server, and then adjusts the priorities and dependencies.
  2. The server uses the WeightedFairQueueByteDistributor rather than the UniformStreamByteDistributor
  3. The exception (shown below) is thrown, causing the Http2 handler to catch it and send a Go away.
  4. After the go away, the underlying connection is closed.
io.netty.handler.codec.http2.Http2Exception: Error flushing
	at io.netty.handler.codec.http2.Http2Exception.connectionError(Http2Exception.java:117)
	at io.netty.handler.codec.http2.Http2ConnectionHandler.flush(Http2ConnectionHandler.java:193)
	at io.netty.handler.codec.http2.Http2ConnectionHandler.channelWritabilityChanged(Http2ConnectionHandler.java:428)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelWritabilityChanged(AbstractChannelHandlerContext.java:441)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelWritabilityChanged(AbstractChannelHandlerContext.java:428)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelWritabilityChanged(AbstractChannelHandlerContext.java:421)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelWritabilityChanged(DefaultChannelPipeline.java:1433)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelWritabilityChanged(AbstractChannelHandlerContext.java:441)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelWritabilityChanged(AbstractChannelHandlerContext.java:428)
	at io.netty.channel.DefaultChannelPipeline.fireChannelWritabilityChanged(DefaultChannelPipeline.java:931)
	at io.netty.channel.ChannelOutboundBuffer.fireChannelWritabilityChanged(ChannelOutboundBuffer.java:628)
	at io.netty.channel.ChannelOutboundBuffer.setWritable(ChannelOutboundBuffer.java:594)
	at io.netty.channel.ChannelOutboundBuffer.decrementPendingOutboundBytes(ChannelOutboundBuffer.java:196)
	at io.netty.channel.ChannelOutboundBuffer.remove(ChannelOutboundBuffer.java:273)
	at io.netty.channel.ChannelOutboundBuffer.removeBytes(ChannelOutboundBuffer.java:352)
	at io.netty.channel.epoll.AbstractEpollStreamChannel.writeBytesMultiple(AbstractEpollStreamChannel.java:305)
	at io.netty.channel.epoll.AbstractEpollStreamChannel.doWriteMultiple(AbstractEpollStreamChannel.java:510)
	at io.netty.channel.epoll.AbstractEpollStreamChannel.doWrite(AbstractEpollStreamChannel.java:422)
	at io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:930)
	at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:532)
	at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:465)
	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.IllegalArgumentException: e.priorityQueueIndex(): 0 (expected: -1) + e: {streamId 7 streamableBytes 0 activeCountForTree 6 pseudoTimeQueueIndex 0 pseudoTimeToWrite 119223 pseudoTime 0 flags 4 pseudoTimeQueue.size() 1 stateOnlyQueueIndex 0 parent.streamId 13} [{streamId 15 streamableBytes 185716 activeCountForTree 6 pseudoTimeQueueIndex 0 pseudoTimeToWrite 0 pseudoTime 0 flags 5 pseudoTimeQueue.size() 1 stateOnlyQueueIndex -1 parent.streamId 7} [{streamId 17 streamableBytes 98524 activeCountForTree 5 pseudoTimeQueueIndex 0 pseudoTimeToWrite 0 pseudoTime 0 flags 5 pseudoTimeQueue.size() 1 stateOnlyQueueIndex -1 parent.streamId 15} [{streamId 19 streamableBytes 167141 activeCountForTree 4 pseudoTimeQueueIndex 0 pseudoTimeToWrite 0 pseudoTime 0 flags 5 pseudoTimeQueue.size() 1 stateOnlyQueueIndex -1 parent.streamId 17} [{streamId 21 streamableBytes 1445060 activeCountForTree 3 pseudoTimeQueueIndex 0 pseudoTimeToWrite 0 pseudoTime 0 flags 5 pseudoTimeQueue.size() 1 stateOnlyQueueIndex -1 parent.streamId 19} [{streamId 23 streamableBytes 2920 activeCountForTree 2 pseudoTimeQueueIndex 0 pseudoTimeToWrite 0 pseudoTime 0 flags 5 pseudoTimeQueue.size() 1 stateOnlyQueueIndex -1 parent.streamId 21} [{streamId 25 streamableBytes 10 activeCountForTree 1 pseudoTimeQueueIndex 0 pseudoTimeToWrite 0 pseudoTime 0 flags 5 pseudoTimeQueue.size() 0 stateOnlyQueueIndex -1 parent.streamId 23} []]]]]]]
	at io.netty.util.internal.DefaultPriorityQueue.offer(DefaultPriorityQueue.java:88)
	at io.netty.util.internal.DefaultPriorityQueue.offer(DefaultPriorityQueue.java:31)
	at io.netty.handler.codec.http2.WeightedFairQueueByteDistributor$State.offerPseudoTimeQueue(WeightedFairQueueByteDistributor.java:671)
	at io.netty.handler.codec.http2.WeightedFairQueueByteDistributor.distributeToChildren(WeightedFairQueueByteDistributor.java:340)
	at io.netty.handler.codec.http2.WeightedFairQueueByteDistributor.distribute(WeightedFairQueueByteDistributor.java:303)
	at io.netty.handler.codec.http2.WeightedFairQueueByteDistributor.distributeToChildren(WeightedFairQueueByteDistributor.java:325)
	at io.netty.handler.codec.http2.WeightedFairQueueByteDistributor.distribute(WeightedFairQueueByteDistributor.java:303)
	at io.netty.handler.codec.http2.WeightedFairQueueByteDistributor.distributeToChildren(WeightedFairQueueByteDistributor.java:325)
	at io.netty.handler.codec.http2.WeightedFairQueueByteDistributor.distribute(WeightedFairQueueByteDistributor.java:303)
	at io.netty.handler.codec.http2.WeightedFairQueueByteDistributor.distributeToChildren(WeightedFairQueueByteDistributor.java:325)
	at io.netty.handler.codec.http2.WeightedFairQueueByteDistributor.distribute(WeightedFairQueueByteDistributor.java:303)
	at io.netty.handler.codec.http2.WeightedFairQueueByteDistributor.distributeToChildren(WeightedFairQueueByteDistributor.java:325)
	at io.netty.handler.codec.http2.WeightedFairQueueByteDistributor.distribute(WeightedFairQueueByteDistributor.java:303)
	at io.netty.handler.codec.http2.WeightedFairQueueByteDistributor.distributeToChildren(WeightedFairQueueByteDistributor.java:325)
	at io.netty.handler.codec.http2.WeightedFairQueueByteDistributor.distribute(WeightedFairQueueByteDistributor.java:273)
	at io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$WritabilityMonitor.writePendingBytes(DefaultHttp2RemoteFlowController.java:627)
	at io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController.writePendingBytes(DefaultHttp2RemoteFlowController.java:267)
	at io.netty.handler.codec.http2.Http2ConnectionHandler.flush(Http2ConnectionHandler.java:188)
	... 24 more

Steps to reproduce

I don’t have an exact way to reproduce it yet, but I can get it to happen about half the time after killing the connections in Chrome. I do have the HTTP/2 state logged by Chrome for a failure though. I can provide it if it would help.

Minimal yet complete reproducer code (or URL to code)

Netty version

4.1.51

JVM version (e.g. java -version)

JDK 11

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:9 (8 by maintainers)

github_iconTop GitHub Comments

2reactions
mostroverkhovcommented, Sep 2, 2020

For workaround with 4.1.51 one can

Http2Connection connection = http2FrameCodec.connection();
      connection
          .remote()
          .flowController(
              new DefaultHttp2RemoteFlowController(
                  connection, new UniformStreamByteDistributor(connection)));
1reaction
ejona86commented, Nov 5, 2020

That IllegalArgumentException is quite cryptic, but looking at the code it should have said: “Cannot add an element that is already in the queue.” I’ll note that it looks like there is a very deep hierarchy of streams, such that it is effectively a linked list. I think the browser is providing a very strict dependency order.

There appears to be a clearly matched poll for the childState that was being offered. Something is clearly re-adding it while distributing. I don’t see any !isDistributing() check when notifyParentChanged() calls offerAndInitializePseudoTime() (I see that in activeCountChangeForTree(), for example).

I think maybe a write() during distribute() is completing a stream and so onStreamRemoved() is called, which eventually offers some stream that is currently being distributed. So the fix may be as simple as surrounding notifyParentChanged()'s offerAndInitializePseudoTime() with a !isDistributing() condition. It’s unclear, but I think the activeCountChangeForTree() is appropriate even with distributing, so it wouldn’t be within the condition.

The code has changed dramatically and gotten much more complex since I reviewed it, such that it is fairly unrecognizable. This investigation was just based on reading the code. Based on the motivation of https://github.com/netty/netty/commit/c4e96d010e3d16810d7130c93169817b3d72b421, I assume the problem is when a child is writable before its parent, but I think it would take me quite some time to figure out precisely what is happening in order to make a test.

@carl-mastrangelo, could you try surrounding notifyParentChanged()'s offerAndInitializePseudoTime() with a !isDistributing() condition? If that fixes the problem, then restore the old code but print a stack trace when isDistributing() == true as well as the event.state; that along with the IllegalStateException dump might be enough to determining exactly what case is being triggered.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Configuring Weighted Fair Queueing - Cisco Content Hub
This module describes the tasks for configuring flow-based weighted fair queueing (WFQ), distributed WFQ (DWFQ), and class-based WFQ (CBWFQ), ...
Read more >
Distributed Weighted Fair Queuing in 802.11 Wireless LAN
Abstract— With Weighted Fair Queuing, the link's bandwidth is distributed among competing flows proportionally to their weights. In this paper we propose an ......
Read more >
Efficient Fair Queuing Using Deficit Round-Robin - cs.wisc.edu
A solution to this problem is needed to isolate the effects of bad behavior to users that are behaving badly. • In addition,...
Read more >
Cisco Catalyst 9000 Switching Platforms: QoS and Queuing
This document describes the Quality-of-Service (QoS) and queuing architecture of the Cisco® Catalyst® 9000 family of switches. It explains the buffer, ...
Read more >
Distributed weighted fair queuing in 802.11 wireless LAN
Simulation results show that the proposed scheme is able to provide the desired bandwidth distribution independent of the flows' aggressiveness ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found