Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unexpected timeouts in logical replication

See original GitHub issue

Hi!

I am trying out the logical replication feature (https://www.npgsql.org/doc/replication.html) and I have a few questions. Hope for help.

I have created a table, a publication and a replication slot. Then I copied the code from the documentation:

await foreach (var message in connection.StartReplication(slot, options, cancellationToken))
{
    Console.WriteLine(message);
}

But every time I run the application, I get all messages from the beginning. Is there some way to confirm the processing of the message? I’ve tried using SendStatusUpdate but it doesn’t work:

await foreach (var message in connection.StartReplication(slot, options, cancellationToken))
{
    Console.WriteLine(message);

    await connection.SendStatusUpdate(cancellationToken);
}

When the application does not receive messages for a long time, I get an exception:

Npgsql.NpgsqlException (0x80004005): Exception while reading from stream
 ---> System.TimeoutException: Timeout during reading attempt
   at Npgsql.NpgsqlConnector.<ReadMessage>g__ReadMessageLong|194_0(NpgsqlConnector connector, Boolean async, DataRowLoadingMode dataRowLoadingMode, Boolean readingNotifications, Boolean isReadingPrepend
edMessage)
   at Npgsql.Replication.ReplicationConnection.StartReplicationInternal(String command, Boolean bypassingStream, CancellationToken cancellationToken)+MoveNext()
   at Npgsql.Replication.ReplicationConnection.StartReplicationInternal(String command, Boolean bypassingStream, CancellationToken cancellationToken)+MoveNext()
   at Npgsql.Replication.ReplicationConnection.StartReplicationInternal(String command, Boolean bypassingStream, CancellationToken cancellationToken)+System.Threading.Tasks.Sources.IValueTaskSource<Syst
em.Boolean>.GetResult()
   at Npgsql.Replication.PgOutput.PgOutputAsyncEnumerable.StartReplicationInternal(CancellationToken cancellationToken)+MoveNext()
   at Npgsql.Replication.PgOutput.PgOutputAsyncEnumerable.StartReplicationInternal(CancellationToken cancellationToken)+MoveNext()
   at Npgsql.Replication.PgOutput.PgOutputAsyncEnumerable.StartReplicationInternal(CancellationToken cancellationToken)+System.Threading.Tasks.Sources.IValueTaskSource<System.Boolean>.GetResult()

I’ve tried using an infinite loop like this:

while (true)
{
    try
    {
        await foreach (var message in connection.StartReplication(slot, options, cancellationToken))
        {
            Console.WriteLine(message);
        }
    }
    catch (NpgsqlException ex)
    {
        Console.WriteLine(ex);
        continue;
    }
}

But this again reads all the messages from the beginning. How to handle this situation correctly?

Is there some way to get old values in updated and deleted rows? In this case, there is no way to understand which row was deleted and process it:

if (message is DeleteMessage deleteMessage)
{
    // How to process this message?
}

Issue Analytics

State:
Created 3 years ago
Comments:16 (13 by maintainers)

Top GitHub Comments

2reactions

Brarcommented, Mar 25, 2021

@Chakrygin I just want to let you know that we’ve released 5.0.4 which contains the fix for the problem described above.

2reactions

Brarcommented, Mar 11, 2021

What is the difference between LastAppliedLsn and LastFlushedLsn? In what scenarios will the LastAppliedLsn update be useful?

It’s essentially two different levels of persistence that you can report back to the server.

Above I wrote that “I’d advise you to keep track of their log sequence number (LSN) in your consuming application” but I since have no idea what your application will do and what consistency guarantees it needs, I didn’t go any further. You might somehow process the transactions you received from the server in memory and report back, that you’ve successfully applied the transaction in your system (e. g. that it’s visible to users) via LastAppliedLsn. On the other hand you may not want to persist the transaction to disk storage immediately (e. g. for performance reasons) using fsync (or FileStream.Flush()) but once you do so, you can report this back to the server via LastFlushedLsn.

In synchronous replication you can use the synchronous_commit server configuration option to configure the guarantees the server will await from the replication standby (your application) for transaction commits.

You can have a look on our SynchronousReplication test if you want to look at the details.

Am I correct in understanding that updating LastAppliedLsn is optional?

I’d say yes, for asynchronous replication scenarios, but if you look at the documentation around synchronous_commit you’ll probably see that it’s pretty confusing. Personally I’d always assign both of them. Either at the same time or independently, depending on whether the client has applied the transaction or has flushed it to the storage system.