Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RPC response queues are not recreated upon connection recovery

See original GitHub issue

Describe the bug Under specific circumstances the RPC response queue cache is not cleared upon connection recovery. After the connection is recovered the requests made contain the same response queue as before the connection was broken. This queue does not exist anymore as it is auto-deleted when the connection is lost. When the handler of the RPC request sends a response it goes nowhere, as the queue is no longer. As a result RPC will hang until timeout is reached. The issue persists until client is restarted.

To Reproduce Steps to reproduce the behavior:

You need a clustered rabbit setup, with at least two nodes.
The request queue and the response queue must reside on different nodes
The prefetchCount connection string argument must be a large value e.g. 10000
Restart the node which has the response queue, to trigger a connection failover.

When the node goes offline, the response queue will get auto-deleted. The rabbitmq client will failover to another node. The connection will get restored, and request will continue to be sent. However, no new response queue is created, and the easynetq client seems to be unaware.

Example console app that reproduces the problem

// See https://aka.ms/new-console-template for more information

using EasyNetQ;
using EasyNetQ.DI;

var connectionString = "host=node1,node2,node3;virtualHost=test;username=user;password=pw;timeout=300;publisherConfirms=true;mandatoryPublish=true";
// Prefetch count is key. Must be large to trigger issue
var prefetch = "prefetchCount=10000";
connectionString += $";{prefetch}";

var conventions = new Conventions(new DefaultTypeNameSerializer())
{
    ConsumerTagConvention = () => "ReproducingApp",
    // Set the response queue name so we can find it in the management UI and use a guid so we can detect if it is recycled
    RpcReturnQueueNamingConvention = (messageType) => $"Resp_{messageType}_ReproducingApp_{Guid.NewGuid()}",
};

using var bus = RabbitHutch.CreateBus(connectionString, x =>
{
    x.EnableConsoleLogger();
    x.Register<IConventions>(conventions);
});

// Setup RPC responder
using var responder = await bus.Rpc.RespondAsync<string, string>((s) =>
{
    global::System.Console.WriteLine($"Handling {s}...");
    return $"Response to {s}";
});

// Setup handling of Ctrl+C 
using var cts = new CancellationTokenSource();
Console.CancelKeyPress += (object? sender, ConsoleCancelEventArgs args) =>
{
    cts.Cancel();
    args.Cancel = true;
};

try
{
    // Send messages forever
    var count = 1;
    while (true)
    {
        try
        {
            Console.WriteLine($"Sending request #{count}...");
            var response = await bus.Rpc.RequestAsync<string, string>($"Request #{count}", cts.Token);
            Console.WriteLine($"Received response from request #{count}");
            count++;
            await Task.Delay(TimeSpan.FromSeconds(5), cts.Token);
        }
        catch (EasyNetQException ex)
        {
            Console.WriteLine($"Caught RPC exception {ex.Message}");
        }
    }
}
catch (TaskCanceledException)
{
    Console.WriteLine("Handling Ctrl+C");
}

Expected behavior I would expect the easyneq client to create a new response queue once the connection is restored, and RPC communication to resume.

Please complete the following information):

EasyNetQ version: 7.2.0
RabbitMQ Server version: 3.8.30
RabbitMQ client version: 6.4.0

Additional context

A hint on what is going on: When the prefetch count is low, the rabbitmq client triggers a connection recovered event for both the consumer and the producer connection. When the prefetch count is high, only the consumer connection restored event is raised. I don’t know why that is, but it is what I could observe.

Issue Analytics

State:
Created a year ago
Comments:9 (6 by maintainers)

Top GitHub Comments

1reaction

Spinkelbencommented, Oct 6, 2022

Thanks for the effort! Unfortunately the fix doesn’t solve the problem as I see it. The behavior around the failure does change a bit. With the alpha version, a TaskCancelled exception is thrown immediately after the connection recovers. However if you catch that exception and continue using the client/bus to make RPC calls, subsequent calls will still time out. The response queue is never re-created.

I had to make a few changes to the reproducing program so it will keep going if the rpc calls throw timeout exception.

The bottom part looks like this now:

try
{
    // Send messages forever
    var count = 1;
    while (true)
    {
        try
        {
            Console.WriteLine($"Sending request #{count}...");
            var response = await bus.Rpc.RequestAsync<string, string>($"Request #{count}", cts.Token);
            Console.WriteLine($"Received response from request #{count}");
            count++;
            await Task.Delay(TimeSpan.FromSeconds(5), cts.Token);
        }
        catch (TaskCanceledException ex)
        {
            Console.WriteLine(ex.ToString());
            if (cts.IsCancellationRequested)
            {
                throw;
            }
        }
        catch (EasyNetQException ex)
        {
            Console.WriteLine($"Caught RPC exception {ex.Message}");
        }
    }
}
catch (TaskCanceledException)
{
    Console.WriteLine("Handling Ctrl+C");
}

1reaction

Plinercommented, Sep 28, 2022

Hi @Spinkelben,

Thanks for reporting this, I am going to investigate it tomorrow.