question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Implicit stream subscriptions seem to stop being called after some time

See original GitHub issue

We’re seeing a grain implicitly subscribed to multiple streams suddenly stop having it’s OnNextAsync() method called. Other explicitly subscribed grains to one of the streams continue to work as expected. Other implicitly subscribed grains continue to work as expected.

Orleans Version:

Microsoft Orleans Client => 2.0.3
Microsoft Orleans Core => 2.0.3
Microsoft Orleans Core_Legacy => 2.0.3
Microsoft Orleans Core_Abstractions => 2.0.0
Microsoft Orleans OrleansCodeGenerator_Build => 2.0.3
Microsoft Orleans OrleansProviders => 2.0.4
Microsoft Orleans Runtime => 2.0.4
Microsoft Orleans Server => 2.0.4
Microsoft Orleans Runtime_Legacy => 2.0.4
Microsoft Orleans TestingHost => 2.0.4

I have a grain class defined similar to:

[StorageProvider(ProviderName = DocumentPersistence.ServiceLifetimeRedisStorage)]
[ImplicitStreamSubscription(DocumentStreams.DocumentA)]
[ImplicitStreamSubscription(DocumentStreams.DocumentB)]
public class MyGrain : Grain<MyGrainState>,
    IMyGrain,
    IAsyncObserver<IEnumerable<DocumentA>>,
    IAsyncObserver<IEnumerable<DocumentB>>
{
    public override async Task OnActivateAsync()
    {
        var id = this.GetPrimaryKey();

        var provider = GetStreamProvider(DocumentStreams.StreamProvider);

        var documentAStream = provider.GetDocumentStream<DocumentA>(id);
        await documentAStream.SubscribeAsync(this);

        var documentBStream = provider.GetDocumentStream<DocumentB>(id);
        await documentBStream.SubscribeAsync(this);
    }

    public async Task HandleDocument(DocumentA change)
    {
        // ...
    }

    public async Task HandleDocument(DocumentB change)
    {
        // ...
    }

    public async Task OnNextAsync(IEnumerable<DocumentA> items, StreamSequenceToken token = null)
    {
        var tasks = new List<Task>();
        foreach (var document in items)
        {
            tasks.Add(HandleDocument(document));
        }

        await Task.WhenAll(tasks);
    }

    public async Task OnNextAsync(IEnumerable<DocumentB> items, StreamSequenceToken token = null)
    {
        var tasks = new List<Task>();
        foreach (var document in items)
        {
            tasks.Add(HandleDocument(document));
        }

        await Task.WhenAll(tasks);
    }

    public Task OnCompletedAsync() => Task.CompletedTask;

    public Task OnErrorAsync(Exception ex)
    {
        GetSerilog().Error(ex, "An exception occurred while receiving stream data");
        return Task.CompletedTask;
    }
}

Most of the time all is well and OnActivatedAsync() followed by OnNextAsync() calls fire as one would expect. However, left running over night, we no longer receive OnNextAsync() calls when expected - none to be exact. Other grains that are subsequently, explicitly subscribed to the streams continue to work.

I should point out that we are using a redis stream and storage provider; however, all other stream subscriptions (both explicit and implicit) are working fine. It’s, seemingly, only this grain with two implicit subscriptions that is no longer receiving stream messages.

Some things I’ve tried to suss out what is actually going on:

  • Tested other stream flows
  • GrainCollectionOptions.CollectionAge = TimeSpan.FromMinutes(1) to force grain collection. I was able to repro the issue but only after still letting it run over night. Once in this state, waiting a number of minutes, I was never able to get OnActivatedAsync() to fire in the grain. This seems to imply the grain was either inaccessible due to some deadlock or busy waiting or that the grain itself was never activated and sent the message. Given that the grain is relatively simple and devoid of any locks, I am suspicious that it’s the latter.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:36 (36 by maintainers)

github_iconTop GitHub Comments

2reactions
shlomiwcommented, Nov 5, 2018

@berdon - it would be helpful if you’d share in general what was the issue. Maybe we could learn from it as well. Thanks in advance

1reaction
berdoncommented, Nov 5, 2018

An unfortunate return instead of continue in the OnNextAsync. So, just a trivial programmer error. 😦

Read more comments on GitHub >

github_iconTop Results From Across the Web

StreamSubscription.cancel vs StreamController.close on ...
Note that canceling a subscription does not mean that the Stream is closed. Stream 's can have multiple listeners receiving events.
Read more >
5 Common Pitfalls When Using Apache Kafka
Any number of under-replicated partitions is a sign of an unhealthy cluster, as it implies that your data is not fully replicated as...
Read more >
Introduction to Streams
When a stream becomes stale, the historical data for the source table is no longer accessible, including any unconsumed change records. To track...
Read more >
What a very bad day at work taught me about building ...
It seems like people have been trained to constantly look for reasons to close questions, criticize questions, and divert from providing answers ...
Read more >
Orleans streaming APIs
By default, a stream consumer has to explicitly subscribe to the stream. This subscription would usually be triggered by some external message ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found