Infinite loop after NATS restart
See original GitHub issueWe experience high CPU load in our services after NATS gets restarted or crashes for some reason.
It seems like there is an infinite loop somewhere in the STAN.CLIENT
.
We have profiled it and it seems like the issue is related to this:
instance bool [STAN.CLIENT] STAN.Client.BlockingDictionary`2[System.__Canon,System.__Canon]::Remove(!0,!1&,int32)
instance class STAN.Client.PublishAck [STAN.CLIENT] STAN.Client.Connection::removeAck(string)
instance void [STAN.CLIENT] STAN.Client.PublishAck::ackTimerCb(object)
After a quick view of the code this snippet might be the issue when 0
is passed to BlockingDictionary.Remove()
as timeout from removeAck()
.
while (d.Count == 0)
{
if (timeout < 0)
{
Monitor.Wait(dLock);
}
else
{
if (timeout > 0)
{
if (Monitor.Wait(dLock, timeout) == false)
{
throw new Exception("timeout");
}
}
}
}
To reproduce it I do the following: (It doesn’t happen every time)
- Crash NATS
- I do this by sending a message which is too large for the MySQL data field. This was the reason we encountered the problem in the first place.
- Restart NATS
- Send a message to NATS through the STAN.CLIENT
- Inspect the docker container for high CPU usage
Let me know if I need to provide a repository for reproducing this issue. I hope it’s not nessesary 😃
Issue Analytics
- State:
- Created 4 years ago
- Comments:7 (5 by maintainers)
Top Results From Across the Web
NATS Keep-Alive Subscription in Golang
So here we go, the 3rd approach, keeping NATS subscriptions alive on the services/subscribers even after NATS server restarted.
Read more >How to loop NATS messages received over channel
And you could be waiting with infinite loop after creating a subscription. However, I think it is better to do a subscription using...
Read more >Been playing for weeks. All of a sudden the game acts like ...
All of a sudden the game acts like it never saw me before and I'm in an infinite loop of logging into my...
Read more >nats
DiscoveredServers returns only the server urls that have been discovered after a connection has been established. If authentication is enabled, ...
Read more >Preview Release New JetStream Client API
This means you no longer need to loop fetch operations for Pull Consumers. Simply call 'consume' on a Pull Consumer, and the client...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
FYI - I’ve found the issue, and have a fix but it’s not ideal imo. I want to clean up some of the code around publish ack processing to keep this clean. I’m hoping to have something tomorrow.
Good to hear! Thank you for testing, much appreciated!
If you do get more information about your performance issue or can isolate a test case please open an issue…