question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Infinite loop after NATS restart

See original GitHub issue

We experience high CPU load in our services after NATS gets restarted or crashes for some reason. It seems like there is an infinite loop somewhere in the STAN.CLIENT.

We have profiled it and it seems like the issue is related to this:

instance bool [STAN.CLIENT] STAN.Client.BlockingDictionary`2[System.__Canon,System.__Canon]::Remove(!0,!1&,int32)
instance class STAN.Client.PublishAck [STAN.CLIENT] STAN.Client.Connection::removeAck(string)
instance void [STAN.CLIENT] STAN.Client.PublishAck::ackTimerCb(object)

After a quick view of the code this snippet might be the issue when 0 is passed to BlockingDictionary.Remove() as timeout from removeAck().

while (d.Count == 0)
{
    if (timeout < 0)
    {
        Monitor.Wait(dLock);
    }
    else
    {
        if (timeout > 0)
        {
            if (Monitor.Wait(dLock, timeout) == false)
            {
                throw new Exception("timeout");
            }
        }
    }
}

To reproduce it I do the following: (It doesn’t happen every time)

  • Crash NATS
    • I do this by sending a message which is too large for the MySQL data field. This was the reason we encountered the problem in the first place.
  • Restart NATS
  • Send a message to NATS through the STAN.CLIENT
  • Inspect the docker container for high CPU usage

Let me know if I need to provide a repository for reproducing this issue. I hope it’s not nessesary 😃

image

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:7 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
ColinSullivan1commented, Sep 10, 2019

FYI - I’ve found the issue, and have a fix but it’s not ideal imo. I want to clean up some of the code around publish ack processing to keep this clean. I’m hoping to have something tomorrow.

0reactions
ColinSullivan1commented, Sep 18, 2019

Good to hear! Thank you for testing, much appreciated!

If you do get more information about your performance issue or can isolate a test case please open an issue…

Read more comments on GitHub >

github_iconTop Results From Across the Web

NATS Keep-Alive Subscription in Golang
So here we go, the 3rd approach, keeping NATS subscriptions alive on the services/subscribers even after NATS server restarted.
Read more >
How to loop NATS messages received over channel
And you could be waiting with infinite loop after creating a subscription. However, I think it is better to do a subscription using...
Read more >
Been playing for weeks. All of a sudden the game acts like ...
All of a sudden the game acts like it never saw me before and I'm in an infinite loop of logging into my...
Read more >
nats
DiscoveredServers returns only the server urls that have been discovered after a connection has been established. If authentication is enabled, ...
Read more >
Preview Release New JetStream Client API
This means you no longer need to loop fetch operations for Pull Consumers. Simply call 'consume' on a Pull Consumer, and the client...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found