Unexpected timeouts in logical replication
See original GitHub issueHi!
I am trying out the logical replication feature (https://www.npgsql.org/doc/replication.html) and I have a few questions. Hope for help.
- I have created a table, a publication and a replication slot. Then I copied the code from the documentation:
await foreach (var message in connection.StartReplication(slot, options, cancellationToken))
{
Console.WriteLine(message);
}
But every time I run the application, I get all messages from the beginning. Is there some way to confirm the processing of the message? I’ve tried using SendStatusUpdate but it doesn’t work:
await foreach (var message in connection.StartReplication(slot, options, cancellationToken))
{
Console.WriteLine(message);
await connection.SendStatusUpdate(cancellationToken);
}
- When the application does not receive messages for a long time, I get an exception:
Npgsql.NpgsqlException (0x80004005): Exception while reading from stream
---> System.TimeoutException: Timeout during reading attempt
at Npgsql.NpgsqlConnector.<ReadMessage>g__ReadMessageLong|194_0(NpgsqlConnector connector, Boolean async, DataRowLoadingMode dataRowLoadingMode, Boolean readingNotifications, Boolean isReadingPrepend
edMessage)
at Npgsql.Replication.ReplicationConnection.StartReplicationInternal(String command, Boolean bypassingStream, CancellationToken cancellationToken)+MoveNext()
at Npgsql.Replication.ReplicationConnection.StartReplicationInternal(String command, Boolean bypassingStream, CancellationToken cancellationToken)+MoveNext()
at Npgsql.Replication.ReplicationConnection.StartReplicationInternal(String command, Boolean bypassingStream, CancellationToken cancellationToken)+System.Threading.Tasks.Sources.IValueTaskSource<Syst
em.Boolean>.GetResult()
at Npgsql.Replication.PgOutput.PgOutputAsyncEnumerable.StartReplicationInternal(CancellationToken cancellationToken)+MoveNext()
at Npgsql.Replication.PgOutput.PgOutputAsyncEnumerable.StartReplicationInternal(CancellationToken cancellationToken)+MoveNext()
at Npgsql.Replication.PgOutput.PgOutputAsyncEnumerable.StartReplicationInternal(CancellationToken cancellationToken)+System.Threading.Tasks.Sources.IValueTaskSource<System.Boolean>.GetResult()
I’ve tried using an infinite loop like this:
while (true)
{
try
{
await foreach (var message in connection.StartReplication(slot, options, cancellationToken))
{
Console.WriteLine(message);
}
}
catch (NpgsqlException ex)
{
Console.WriteLine(ex);
continue;
}
}
But this again reads all the messages from the beginning. How to handle this situation correctly?
- Is there some way to get old values in updated and deleted rows? In this case, there is no way to understand which row was deleted and process it:
if (message is DeleteMessage deleteMessage)
{
// How to process this message?
}
Issue Analytics
- State:
- Created 3 years ago
- Comments:16 (13 by maintainers)
Top Results From Across the Web
How to understand why logical replication timeout
I have set up N-1 Postgresql (v12) logical replications: N publisher dbs to 1 subscriber db. And there are replication timeout logs in...
Read more >Thread: Logical replication timeout problem
Logical replication is configured on one instance in version 10.18. Timeout errors occur regularly and the worker process exit with an exit code...
Read more >How PostgreSQL 15 improved communication in logical ...
This may cause unexpected timeout error even though the walsender is working as expected. Contents. > Communication in logical replication.
Read more >Re: Logical replication hangs up.
we are suing logical replication on 10.4 and it now hangs. After > > some timeout it is retarted again, replaying 18GB of...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@Chakrygin I just want to let you know that we’ve released 5.0.4 which contains the fix for the problem described above.
It’s essentially two different levels of persistence that you can report back to the server.
Above I wrote that “I’d advise you to keep track of their log sequence number (LSN) in your consuming application” but I since have no idea what your application will do and what consistency guarantees it needs, I didn’t go any further. You might somehow process the transactions you received from the server in memory and report back, that you’ve successfully applied the transaction in your system (e. g. that it’s visible to users) via
LastAppliedLsn
. On the other hand you may not want to persist the transaction to disk storage immediately (e. g. for performance reasons) using fsync (orFileStream.Flush()
) but once you do so, you can report this back to the server viaLastFlushedLsn
.In synchronous replication you can use the
synchronous_commit
server configuration option to configure the guarantees the server will await from the replication standby (your application) for transaction commits.You can have a look on our
SynchronousReplication
test if you want to look at the details.I’d say yes, for asynchronous replication scenarios, but if you look at the documentation around
synchronous_commit
you’ll probably see that it’s pretty confusing. Personally I’d always assign both of them. Either at the same time or independently, depending on whether the client has applied the transaction or has flushed it to the storage system.