question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Connection recovery hangs after EOFException is thrown

See original GitHub issue

Lyra version 0.5.2 (also tested 0.5.3-SNAPSHOT with the same result) amqp-client 3.5.5 Broker version 3.5.6

We have an automatic test suite of a java rabbit library built on top of lyra and amqp-client. We use docker and makes the junit test start and stop a real rabbit broker on the same machine that runs the tests. One of the tests simulates a broker crash by using the docker kill command on the broker image at the same time as we are consuming from a queue on the broker.

We use persistent messages so when the broker starts up again after a few seconds the messages are still in the queue BUT it seems that Lyra does not re-connect to the broker in all cases.

In my colleges machine, which is a Mac then Lyra correctly re-connects when the broker starts up again. But on my linux machine only one re-connection attempt is made then everything freezes.

The difference we can see is that on a Mac we get a Socket ‘connection refused’ exception but on linux it is an java.io.EOFException that is thrown.

This is a snippet from the logs in the test suite we have:

19:58:08.322 [main] INFO  c.m.d.test.RabbitConnectionTests - Killing the rabbitMQ broker
19:58:08.375 [AMQP Connection 127.0.0.1:5672] ERROR n.jodah.lyra.internal.ChannelHandler - Channel channel-1 on test-app-consume was closed unexpectedly
19:58:08.377 [AMQP Connection 127.0.0.1:5672] ERROR n.jodah.lyra.internal.ChannelHandler - Channel channel-1 on test-app-publish was closed unexpectedly
19:58:08.379 [AMQP Connection 127.0.0.1:5672] ERROR n.j.lyra.internal.ConnectionHandler - Connection test-app-publish was closed unexpectedly
19:58:08.385 [AMQP Connection 127.0.0.1:5672] ERROR n.j.lyra.internal.ConnectionHandler - Connection test-app-consume was closed unexpectedly
19:58:08.385 [lyra-recovery-1] INFO  n.j.lyra.internal.ConnectionHandler - Recovering connection test-app-publish to [localhost:5672]
19:58:08.387 [lyra-recovery-2] INFO  n.j.lyra.internal.ConnectionHandler - Recovering connection test-app-consume to [localhost:5672]
19:58:08.388 [rabbitmq-test-app-consume-consumer] ERROR c.m.d.c.impl.SingleChannelConsumer - The rabbit connection was unexpectedly disconnected. [ localPort=5672, queue="test-queue", consumerTag="test-app-consumer-1" ]
com.rabbitmq.client.ShutdownSignalException: connection error
    at com.rabbitmq.client.impl.AMQConnection.startShutdown(AMQConnection.java:723) ~[amqp-client-3.5.5.jar:na]
    at com.rabbitmq.client.impl.AMQConnection.shutdown(AMQConnection.java:713) ~[amqp-client-3.5.5.jar:na]
    at com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:567) ~[amqp-client-3.5.5.jar:na]
    at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
Caused by: java.io.EOFException: null
    at java.io.DataInputStream.readUnsignedByte(DataInputStream.java:290) ~[na:1.8.0_60]
    at com.rabbitmq.client.impl.Frame.readFrom(Frame.java:95) ~[amqp-client-3.5.5.jar:na]
    at com.rabbitmq.client.impl.SocketFrameHandler.readFrame(SocketFrameHandler.java:139) ~[amqp-client-3.5.5.jar:na]
    at com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:536) ~[amqp-client-3.5.5.jar:na]
    ... 1 common frames omitted

Issue Analytics

  • State:closed
  • Created 8 years ago
  • Comments:5 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
jhaltermancommented, Nov 20, 2015

Hi @karlney Interesting difference between platforms. That’s one of the problems I’ve faced is which errors should be considered recoverable and which shouldn’t, since the exceptions you’ll see for the same failure can by platform. If you have a reproducer for this that you could share, that would be great.

In the meantime, consider that you can get and modify the set of exceptions that Lyra will attempt to recover from which should resolve this situation for you:

http://jodah.net/lyra/javadoc/net/jodah/lyra/config/Config.html#getRecoverableExceptions--

As for what the appropriate solution should be… I’m not sure. Basically, we could add EOFException as one of the default exceptions to recover from. I’m just not sure how appropriate that is given the odd nature of this failure. Thoughts?

0reactions
michaelklishincommented, Oct 4, 2017

Answered in https://github.com/jhalterman/lyra/issues/53#issuecomment-278593159. Lyra has a configurable list of exceptions to try. If there’s interest in revisiting the default list, please file a separate issue (or submit a PR).

Read more comments on GitHub >

github_iconTop Results From Across the Web

Caught an exception during connection recovery. Caused by ...
When RabbitMQ restarts, the consumers try to re establish the connection with RabbitMQ server after 5 seconds, but fail with an exception message...
Read more >
Cassandra EOFException thrown on some queries after a ...
Show activity on this post. Instead of deleting the whole CF, you can just delete the corrupt SSTable then run repair to recover...
Read more >
How to Fix the EOFException in Java.io - Rollbar
EOFException is a checked exception in Java that occurs when an end of file or end of stream is reached unexpectedly during input....
Read more >
KahaDB recovery problem after out of free disk - JBoss.org
Hi,. Activemq (5.3.1-01-00) can`t start after out of free disk space crash. The following exception throws:.
Read more >
Resolved Problems - Oracle Help Center
The following error messages were thrown: <Warning> <JDBC> <001096> <Refreshing this bad pool connection failed weblogic.common.ResourceException: java.sql.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found