Cannot re-connect to server after nats-streaming-server is restarted
See original GitHub issueI’ve encountered the following issue while trying to reconnect to nats-streaming-server. Here are the steps to reproduce:
- nats-streaming-server is running
- the app (see demo code below) connects to nats-streaming-server and starts publishing messages
- the nats-streaming-server is killed
- on each subsequent Publish,
STAN.Client.StanConnectionException: The NATS connection is reconnecting
is printed - after a while (server still not up), on each Publish, the app starts to receive
STAN.Client.StanConnectionClosedException: Connection closed.
- I restart nats-streaming-server
- The app’s ReconnectedEventHandler is fired, detecting nats connection restore.
- But the app’s StanConnection never recovers and the application can never resume successful publishing.
What can I do in order to allow the app to recover publishing?
Running on OSX Catalina and using latest nuget packages available at the current time:
<ItemGroup>
<PackageReference Include="NATS.Client" Version="0.10.0" />
<PackageReference Include="STAN.Client" Version="0.2.0" />
</ItemGroup>
Code used for testing:
using System;
using System.Diagnostics;
using System.Threading;
using NATS.Client;
using STAN.Client;
namespace StanTest
{
class Program
{
static void Main(string[] args)
{
var cf = new ConnectionFactory();
var sf = new StanConnectionFactory();
var natsConnection = cf.CreateConnection(GetOpts());
var stanOpts = StanOptions.GetDefaultOptions();
stanOpts.ConnectTimeout = 4000;
stanOpts.NatsConn = natsConnection;
stanOpts.PubAckWait = 40000;
var stanConnection = sf.CreateConnection("test-cluster", "uniq123", stanOpts);
var watch = Stopwatch.StartNew();
while (true)
{
try
{
stanConnection.Publish("test", new byte[0x1]);
Console.WriteLine("{0}. Published message", watch.Elapsed);
}
catch (Exception e)
{
Console.WriteLine("{0} - Type:{1}. On Publish: exception message: {2}", watch.Elapsed, e.GetType(), e.Message);
}
finally
{
Thread.Sleep(1000);
}
}
}
private static Options GetOpts()
{
var opts = ConnectionFactory.GetDefaultOptions();
opts.Url = "nats://localhost:4222";
opts.AllowReconnect = true;
opts.PingInterval = 5000;
opts.MaxPingsOut = 2;
opts.MaxReconnect = Options.ReconnectForever;
opts.ReconnectWait = 1000;
opts.Timeout = 4000;
opts.ServerDiscoveredEventHandler += (sender, args) => Console.WriteLine("NATS server discovered");
opts.ReconnectedEventHandler +=
(sender, args) => Console.WriteLine( "NATS server reconnected.");
opts.ClosedEventHandler +=
(sender, args) => Console.WriteLine("NATS connection closed");
opts.DisconnectedEventHandler += (sender, args) =>
Console.WriteLine("NATS connection disconnected");
opts.AsyncErrorEventHandler +=
(sender, args) => Console.WriteLine("NATS async error: {0}, Message={1}, Subject={2}", args.Conn.ConnectedUrl,
args.Error, args.Subscription.Subject);
return opts;
}
}
}
And here are the messages printed to console while running:
/usr/local/share/dotnet/dotnet /Users/robert/Sandbox/StanTest/bin/Debug/netcoreapp3.0/StanTest.dll
00:00:00.0237181. Published message
00:00:01.0443621. Published message
NATS connection disconnected
00:00:02.0483280 - Type:STAN.Client.StanConnectionException. On Publish: exception message: The NATS connection is reconnecting
00:00:03.0528216 - Type:STAN.Client.StanConnectionException. On Publish: exception message: The NATS connection is reconnecting
00:00:04.0580304 - Type:STAN.Client.StanConnectionException. On Publish: exception message: The NATS connection is reconnecting
00:00:05.0612458 - Type:STAN.Client.StanConnectionException. On Publish: exception message: The NATS connection is reconnecting
00:00:06.0655462 - Type:STAN.Client.StanConnectionException. On Publish: exception message: The NATS connection is reconnecting
00:00:07.0696808 - Type:STAN.Client.StanConnectionException. On Publish: exception message: The NATS connection is reconnecting
00:00:08.0721798 - Type:STAN.Client.StanConnectionException. On Publish: exception message: The NATS connection is reconnecting
00:00:09.0763581 - Type:STAN.Client.StanConnectionException. On Publish: exception message: The NATS connection is reconnecting
00:00:10.0803322 - Type:STAN.Client.StanConnectionException. On Publish: exception message: The NATS connection is reconnecting
00:00:11.0852381 - Type:STAN.Client.StanConnectionException. On Publish: exception message: The NATS connection is reconnecting
00:00:12.0898777 - Type:STAN.Client.StanConnectionException. On Publish: exception message: The NATS connection is reconnecting
00:00:13.0904445 - Type:STAN.Client.StanConnectionException. On Publish: exception message: The NATS connection is reconnecting
00:00:14.0946184 - Type:STAN.Client.StanConnectionException. On Publish: exception message: The NATS connection is reconnecting
00:00:15.0953681 - Type:STAN.Client.StanConnectionException. On Publish: exception message: The NATS connection is reconnecting
00:00:16.0963306 - Type:STAN.Client.StanConnectionException. On Publish: exception message: The NATS connection is reconnecting
00:00:17.1017706 - Type:STAN.Client.StanConnectionException. On Publish: exception message: The NATS connection is reconnecting
00:00:18.1047256 - Type:STAN.Client.StanConnectionException. On Publish: exception message: The NATS connection is reconnecting
00:00:19.1077148 - Type:STAN.Client.StanConnectionException. On Publish: exception message: The NATS connection is reconnecting
00:00:20.1114949 - Type:STAN.Client.StanConnectionClosedException. On Publish: exception message: Connection closed.
00:00:21.1162681 - Type:STAN.Client.StanConnectionClosedException. On Publish: exception message: Connection closed.
00:00:22.1206395 - Type:STAN.Client.StanConnectionClosedException. On Publish: exception message: Connection closed.
00:00:23.1238276 - Type:STAN.Client.StanConnectionClosedException. On Publish: exception message: Connection closed.
NATS server reconnected.
00:00:24.1287867 - Type:STAN.Client.StanConnectionClosedException. On Publish: exception message: Connection closed.
00:00:25.1324792 - Type:STAN.Client.StanConnectionClosedException. On Publish: exception message: Connection closed.
00:00:26.1373635 - Type:STAN.Client.StanConnectionClosedException. On Publish: exception message: Connection closed.
00:00:27.1406061 - Type:STAN.Client.StanConnectionClosedException. On Publish: exception message: Connection closed.
00:00:28.1452916 - Type:STAN.Client.StanConnectionClosedException. On Publish: exception message: Connection closed.
00:00:29.1501845 - Type:STAN.Client.StanConnectionClosedException. On Publish: exception message: Connection closed.
^C
Issue Analytics
- State:
- Created 4 years ago
- Comments:30 (13 by maintainers)
Top Results From Across the Web
c# - How to renew a singleton closed Stan (nats streaming ...
When it loses the connection to the nats streaming server for 15 seconds it is in reconnecting mode and if the connection works...
Read more >Resolving a NATS streaming cluster failed restart
If the NATS streaming cluster fails to restart automatically, delete the PVC and the failing pod so that a new pod can be...
Read more >NATS streaming reconnecting to a subscription
I'm not managing to get clear on ordering guarantees provided by NATS in the context of resubscribing after a disconnection. The problem I'm...
Read more >Client Connections - NATS Docs
This means that if the streaming server is stopped, all state is lost. On server restart, since no connection information is recovered, running...
Read more >Investigating NATS issues - DataMiner Docs
Try a NATS reset; Check if new NATS connections can be established; Check the NAS and NATS logging; Check if port is already...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I have encountered the same situation in version 0.3.0. when my Stan connection entered to Connection Closed Mode it never comes to connect state and all publish requests throw the Connection Close Exception.
this is my code to register stan Connection:
and I use this connection like this :
Many thanks for your time spent on this issue! Is kind of odd and I can never imagine a practical situation in which I would not like that a STAN client to recover when ping failed but the enclosed NATS connection is alive. I was led by the impression that STAN connection should be long lived (and act as a singleton) similar to NATS connection object. In my prod app, I was registering the STAN connection as a singleton with the DI container. So, to conclude: you are telling me that:
Either:
Could you reconsider the architectural decision of not recovering STAN connection in case that NATS streaming server is running in embedded mode and STAN client using a not self-managed NATS connection and make somehow transparent for the STAN client how the NATS server is ran?
I will try to reproduce this test using the java streaming server client and see if the behaviour is similar.