question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RPC response queues are not recreated upon connection recovery

See original GitHub issue

Describe the bug Under specific circumstances the RPC response queue cache is not cleared upon connection recovery. After the connection is recovered the requests made contain the same response queue as before the connection was broken. This queue does not exist anymore as it is auto-deleted when the connection is lost. When the handler of the RPC request sends a response it goes nowhere, as the queue is no longer. As a result RPC will hang until timeout is reached. The issue persists until client is restarted.

To Reproduce Steps to reproduce the behavior:

  1. You need a clustered rabbit setup, with at least two nodes.
  2. The request queue and the response queue must reside on different nodes
  3. The prefetchCount connection string argument must be a large value e.g. 10000
  4. Restart the node which has the response queue, to trigger a connection failover.

When the node goes offline, the response queue will get auto-deleted. The rabbitmq client will failover to another node. The connection will get restored, and request will continue to be sent. However, no new response queue is created, and the easynetq client seems to be unaware.

Example console app that reproduces the problem

// See https://aka.ms/new-console-template for more information

using EasyNetQ;
using EasyNetQ.DI;

var connectionString = "host=node1,node2,node3;virtualHost=test;username=user;password=pw;timeout=300;publisherConfirms=true;mandatoryPublish=true";
// Prefetch count is key. Must be large to trigger issue
var prefetch = "prefetchCount=10000";
connectionString += $";{prefetch}";

var conventions = new Conventions(new DefaultTypeNameSerializer())
{
    ConsumerTagConvention = () => "ReproducingApp",
    // Set the response queue name so we can find it in the management UI and use a guid so we can detect if it is recycled
    RpcReturnQueueNamingConvention = (messageType) => $"Resp_{messageType}_ReproducingApp_{Guid.NewGuid()}",
};

using var bus = RabbitHutch.CreateBus(connectionString, x =>
{
    x.EnableConsoleLogger();
    x.Register<IConventions>(conventions);
});

// Setup RPC responder
using var responder = await bus.Rpc.RespondAsync<string, string>((s) =>
{
    global::System.Console.WriteLine($"Handling {s}...");
    return $"Response to {s}";
});

// Setup handling of Ctrl+C 
using var cts = new CancellationTokenSource();
Console.CancelKeyPress += (object? sender, ConsoleCancelEventArgs args) =>
{
    cts.Cancel();
    args.Cancel = true;
};

try
{
    // Send messages forever
    var count = 1;
    while (true)
    {
        try
        {
            Console.WriteLine($"Sending request #{count}...");
            var response = await bus.Rpc.RequestAsync<string, string>($"Request #{count}", cts.Token);
            Console.WriteLine($"Received response from request #{count}");
            count++;
            await Task.Delay(TimeSpan.FromSeconds(5), cts.Token);
        }
        catch (EasyNetQException ex)
        {
            Console.WriteLine($"Caught RPC exception {ex.Message}");
        }
    }
}
catch (TaskCanceledException)
{
    Console.WriteLine("Handling Ctrl+C");
}

Expected behavior I would expect the easyneq client to create a new response queue once the connection is restored, and RPC communication to resume.

Please complete the following information):

  • EasyNetQ version: 7.2.0
  • RabbitMQ Server version: 3.8.30
  • RabbitMQ client version: 6.4.0

Additional context

A hint on what is going on: When the prefetch count is low, the rabbitmq client triggers a connection recovered event for both the consumer and the producer connection. When the prefetch count is high, only the consumer connection restored event is raised. I don’t know why that is, but it is what I could observe.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:9 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
Spinkelbencommented, Oct 6, 2022

Thanks for the effort! Unfortunately the fix doesn’t solve the problem as I see it. The behavior around the failure does change a bit. With the alpha version, a TaskCancelled exception is thrown immediately after the connection recovers. However if you catch that exception and continue using the client/bus to make RPC calls, subsequent calls will still time out. The response queue is never re-created.

I had to make a few changes to the reproducing program so it will keep going if the rpc calls throw timeout exception.

The bottom part looks like this now:

try
{
    // Send messages forever
    var count = 1;
    while (true)
    {
        try
        {
            Console.WriteLine($"Sending request #{count}...");
            var response = await bus.Rpc.RequestAsync<string, string>($"Request #{count}", cts.Token);
            Console.WriteLine($"Received response from request #{count}");
            count++;
            await Task.Delay(TimeSpan.FromSeconds(5), cts.Token);
        }
        catch (TaskCanceledException ex)
        {
            Console.WriteLine(ex.ToString());
            if (cts.IsCancellationRequested)
            {
                throw;
            }
        }
        catch (EasyNetQException ex)
        {
            Console.WriteLine($"Caught RPC exception {ex.Message}");
        }
    }
}
catch (TaskCanceledException)
{
    Console.WriteLine("Handling Ctrl+C");
}
1reaction
Plinercommented, Sep 28, 2022

Hi @Spinkelben,

Thanks for reporting this, I am going to investigate it tomorrow.

Read more comments on GitHub >

github_iconTop Results From Across the Web

RPC style response queue is not recreated when client ...
Source using EasyNetQ; var bus = RabbitHutch.CreateBus("host=localhost;port=5672;username=guest;password=guest;prefetchCount=10000", x => x.
Read more >
How to re-declare queue if it's get deleted in RPC RabbitMQ
Now if server created the queue and connect with it while queue get's deleted for some reason. The server is not throwing any...
Read more >
oslo.messaging holds connections when replies fail
About absent of destination reply queue, we have two cases: 1) rabbit is restarted: so RPCServer have to wait that clients come back...
Read more >
13 Common RabbitMQ Mistakes and How to Avoid Them
Try to keep the connection/channel count low. Use separate connections to publish and consume. Ideally, you should have one connection per ...
Read more >
As someone who has used RabbitMQ in production ...
(Re performance, relying on ACK/NACK with RPC is a bad idea. The better solution is to move retrying into the client side and...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found