question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Subscriptions topic stalls after some time.

See original GitHub issue

Is there an existing issue for this?

  • I have searched the existing issues

Product

Hot Chocolate

Describe the bug

Subscription topic randomly stops working after some time. The client is still subscribed and ping/pongs are still being sent, and new messages arrive for some topics except for the failed one. It’s not limited to a specific topic, sometimes topicA fails, another time it’s topicB, but eventually they all fail.

I have made a minimal solution that can be used to reproduce the issue, you can find it in this repo: https://github.com/DownGoat/HotChocolate13SubIssue

It has a single query that has properties that takes some time to resolve (simulating dataloaders, and slow db calls), and a single subscription that returns some random data. This is very similar to our setup for our project where a controller endpoint is being sent data every 5 seconds that it pushes to the topic that usually stalls first. It is the topic that has the most subscribers, and the only one that frequently sends new messages.

I have tested the same solution with version 12.17.0, and I have not managed to reproduce the issue. We first noticed it after upgrading to version 13.

Steps to reproduce

  1. Start the following subscription subscription VesselPositions { listVessels { timeStamp int1 int2 int3 } }
  2. Send data to the topic by sending a GET request to https://localhost:5001/WeatherForecast. This endpoint sends a list of three entities with random data for the intN fields and the current time. I use postman to repeat this request forever with a 10ms delay between each request.
  3. Start and stop the subscription in BCP, after some tries the subscription stalls, and you wont be getting any new data. If this is taking a lot of time, try opening a new tab in BCP where you run the following query, while it resolves continue starting and stopping the subscription. query WatchingPaintDry { slowEntity { prop1 prop2 prop3 prop4 } }

Relevant log output

No response

Additional Context?

No response

Version

13.x.x

Issue Analytics

  • State:closed
  • Created 7 months ago
  • Reactions:5
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

5reactions
nikolai-mbcommented, Feb 21, 2023

We are having the same problem. After spending quite a lot of time debugging this in the HotChocolate libraries, we managed to narrow down the issue to the HotChocolate.Subscriptions.TopicShard<T> class.

TLDR: Closed subscriptions are not cleaned up correctly in HC (we think) and there is a major bug which in theory should impact anyone using subscriptions.

As far as we can tell: A new channel is added to the _outgoing list in the topic shard when a new subscription is created. When removing the subscription (clicking the stop / cancel button in BCP) the socket is stopped in the browser, but the channel is never removed from the list mentioned above in hot chocolate.

When reproducing we see the following: The topic itself has a outbound channel with a default buffer of 64 messages. Each subscription get’s its own channel as outgoing buffer. In order for the topic to “complete” a outgoing message, each subscriber needs to receive and acknowledge their copy. But because closing subscriptions never completes / removes the associated channel, the _outgoing list in the TopicShard grows by one and is never reduced again.

This in turn blocks the topic incoming queue which reaches it’s capacity of 64 items and the messages are never processed since the queue for the “dead” subscription are still pending.

We have reproduced this with BOTH graphql-ws protocols and graphql-transport-ws transports as well as both in memory and redis subscription handlers.

We have also confirmed that this issue is per topic. Create and stop a subscription, then publish 64 messages to the topic, and the topic will be stalled indefinitely.

It also seems like one subscription can stall the entire topic processing if it never acknowledges it’s messages.

The channel removal / cleanup issue is also very likely to be the source of this redis specific issue: https://github.com/ChilliCream/graphql-platform/issues/5336

4reactions
Sibustencommented, Mar 7, 2023

Michael posted this workaround in the Slack for in-memory, which seems to work for me:

builder.Services.AddGraphQLServer()
    .AddInMemorySubscriptions(new SubscriptionOptions
    {
        TopicBufferFullMode = TopicBufferFullMode.DropOldest
    });
Read more comments on GitHub >

github_iconTop Results From Across the Web

Persistent Subscriptions stalling after a few minutes #2629
What we observe is that initially, events are processed and the persistent subscription works as expected. However, after some time (~5 mins in ......
Read more >
Catch-up mode subscription stalled : after a while, we don't ...
Hi,. We recently faced some issues with catch-up mode subscriptions leading to situations where our read projection
Read more >
As news subscriptions stall, the U.S. market is faring better ...
Globally, subscription growth for news organizations has turned sideways, according to the Reuters Institute's latest Digital News Report ...
Read more >
Chrome stalls when making multiple requests to same ...
"Stalled" will range up to 20 seconds. It won't happen on every request, but will usually happen on several requests in a row,...
Read more >
How To Fix a Lawn Mower That Quits, Dies or Stalls After 2 ...
If your lawnmower start and runs for a couple of minutes then stops, it could be a venting issue ! I show you...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found