Connection recovery hangs after EOFException is thrown
See original GitHub issueLyra version 0.5.2 (also tested 0.5.3-SNAPSHOT with the same result) amqp-client 3.5.5 Broker version 3.5.6
We have an automatic test suite of a java rabbit library built on top of lyra and amqp-client. We use docker and makes the junit test start and stop a real rabbit broker on the same machine that runs the tests. One of the tests simulates a broker crash by using the docker kill command on the broker image at the same time as we are consuming from a queue on the broker.
We use persistent messages so when the broker starts up again after a few seconds the messages are still in the queue BUT it seems that Lyra does not re-connect to the broker in all cases.
In my colleges machine, which is a Mac then Lyra correctly re-connects when the broker starts up again. But on my linux machine only one re-connection attempt is made then everything freezes.
The difference we can see is that on a Mac we get a Socket ‘connection refused’ exception but on linux it is an java.io.EOFException that is thrown.
This is a snippet from the logs in the test suite we have:
19:58:08.322 [main] INFO c.m.d.test.RabbitConnectionTests - Killing the rabbitMQ broker
19:58:08.375 [AMQP Connection 127.0.0.1:5672] ERROR n.jodah.lyra.internal.ChannelHandler - Channel channel-1 on test-app-consume was closed unexpectedly
19:58:08.377 [AMQP Connection 127.0.0.1:5672] ERROR n.jodah.lyra.internal.ChannelHandler - Channel channel-1 on test-app-publish was closed unexpectedly
19:58:08.379 [AMQP Connection 127.0.0.1:5672] ERROR n.j.lyra.internal.ConnectionHandler - Connection test-app-publish was closed unexpectedly
19:58:08.385 [AMQP Connection 127.0.0.1:5672] ERROR n.j.lyra.internal.ConnectionHandler - Connection test-app-consume was closed unexpectedly
19:58:08.385 [lyra-recovery-1] INFO n.j.lyra.internal.ConnectionHandler - Recovering connection test-app-publish to [localhost:5672]
19:58:08.387 [lyra-recovery-2] INFO n.j.lyra.internal.ConnectionHandler - Recovering connection test-app-consume to [localhost:5672]
19:58:08.388 [rabbitmq-test-app-consume-consumer] ERROR c.m.d.c.impl.SingleChannelConsumer - The rabbit connection was unexpectedly disconnected. [ localPort=5672, queue="test-queue", consumerTag="test-app-consumer-1" ]
com.rabbitmq.client.ShutdownSignalException: connection error
at com.rabbitmq.client.impl.AMQConnection.startShutdown(AMQConnection.java:723) ~[amqp-client-3.5.5.jar:na]
at com.rabbitmq.client.impl.AMQConnection.shutdown(AMQConnection.java:713) ~[amqp-client-3.5.5.jar:na]
at com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:567) ~[amqp-client-3.5.5.jar:na]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
Caused by: java.io.EOFException: null
at java.io.DataInputStream.readUnsignedByte(DataInputStream.java:290) ~[na:1.8.0_60]
at com.rabbitmq.client.impl.Frame.readFrom(Frame.java:95) ~[amqp-client-3.5.5.jar:na]
at com.rabbitmq.client.impl.SocketFrameHandler.readFrame(SocketFrameHandler.java:139) ~[amqp-client-3.5.5.jar:na]
at com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:536) ~[amqp-client-3.5.5.jar:na]
... 1 common frames omitted
Issue Analytics
- State:
- Created 8 years ago
- Comments:5 (1 by maintainers)
Top GitHub Comments
Hi @karlney Interesting difference between platforms. That’s one of the problems I’ve faced is which errors should be considered recoverable and which shouldn’t, since the exceptions you’ll see for the same failure can by platform. If you have a reproducer for this that you could share, that would be great.
In the meantime, consider that you can get and modify the set of exceptions that Lyra will attempt to recover from which should resolve this situation for you:
http://jodah.net/lyra/javadoc/net/jodah/lyra/config/Config.html#getRecoverableExceptions--
As for what the appropriate solution should be… I’m not sure. Basically, we could add EOFException as one of the default exceptions to recover from. I’m just not sure how appropriate that is given the odd nature of this failure. Thoughts?
Answered in https://github.com/jhalterman/lyra/issues/53#issuecomment-278593159. Lyra has a configurable list of exceptions to try. If there’s interest in revisiting the default list, please file a separate issue (or submit a PR).