question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Bot State failures when extending with a sub class

See original GitHub issue

Version

4.5.3

Describe the bug

This is a 2-parter:

  1. I am getting a DocumentClientException (I am using Cosmos DB for Bot State): Message: {"Errors":["The request payload is invalid. Ensure to provide a valid request payload."]} when calling BotState.SaveChangesAsync at the end of IBot.OnTurnAsync. This seems to be happening for the ConversationState rather than UserState. I am getting no indication as to what might be wrong with the payload.

  2. In order to try to diagnose this scenario, I added a bunch of additional diagnostics and logging to my IBot implementation, along with a retry in case it is a transient error (it is not). However, the BotState.Get method is throwing a NullReferenceException: Value cannot be null. Parameter name: o preventing me from inspecting and knowing what might be wrong with the bot state.

This second issue seems to be in the implementation of BotState: https://github.com/microsoft/botbuilder-dotnet/blob/master/libraries/Microsoft.Bot.Builder/BotState.cs

Specifically, the Get method uses the following code to get the cached state:

            var stateKey = this.GetType().Name;
            var cachedState = turnContext.TurnState.Get<object>(stateKey);

Meanwhile, the SaveChangesAsync method uses:

var cachedState = turnContext.TurnState.Get<CachedBotState>(_contextServiceKey);

I assume _contextServiceKey and this.GetType().Name produce different key values in the case where I am extending ConversationState or UserState?

To Reproduce

Steps to reproduce the behavior:

  1. Extend ConversationState. Below is what I have, with some noise removed:
    public class CrewConversationState : ConversationState
    {
        ...
        private readonly IBotTelemetryClient _botTelemetryClient;

        public CrewConversationState(IStorage storage, ..., IBotTelemetryClient botTelemetryClient)
            : base(storage)
        {
            ...
            _botTelemetryClient = botTelemetryClient;
        }

        // Give me public access to storage keys
        public string GetKey(ITurnContext turnContext)
        {
            return GetStorageKey(turnContext);
        }

        protected override string GetStorageKey(ITurnContext turnContext)
        {
            ... // Custom implementation
        }
    }
  1. Update the DialogBot that came from the Virtual Assistant template, again with some domain-specific noise removed):
    public class DialogBot<T> : IBot
        where T : Dialog
    {
        private readonly Dialog _dialog;
        private readonly BotState _conversationState;
        private readonly BotState _userState;
        ...
        private readonly IBotTelemetryClient _telemetryClient;

        public DialogBot(IServiceProvider serviceProvider, T dialog)
        {
            _dialog = dialog;
            _conversationState = serviceProvider.GetService<ConversationState>();
            _userState = serviceProvider.GetService<UserState>();
            ...
            _telemetryClient = serviceProvider.GetService<IBotTelemetryClient>();
        }

        public async Task OnTurnAsync(ITurnContext turnContext, CancellationToken cancellationToken = default(CancellationToken))
        {
            // Client notifying this bot took to long to respond (timed out)
            if (turnContext.Activity.Code == EndOfConversationCodes.BotTimedOut)
            {
                _telemetryClient.TrackTrace($"Timeout in {turnContext.Activity.ChannelId} channel: Bot took too long to respond.", Severity.Information, new Dictionary<string, string>
                {
                    ...
                });
                return;
            }

            await _dialog.RunAsync(turnContext, _conversationState.CreateProperty<DialogState>(nameof(DialogState)), cancellationToken);

            // Save any state changes that might have occured during the turn.
            try
            {
                await SaveBotState(_conversationState, turnContext, false, cancellationToken).ConfigureAwait(false);
                await SaveBotState(_userState, turnContext, false, cancellationToken).ConfigureAwait(false);
            }
            catch (DocumentClientException ex)
            {
                _telemetryClient.TrackException(ex);
                _telemetryClient.TrackEvent("Clearing all bot state, conversation and user, associated with current context", new Dictionary<string, string>
                {
                    ...
                });
                await _conversationState.ClearStateAsync(turnContext, cancellationToken);
                await _userState.ClearStateAsync(turnContext, cancellationToken);
                await SaveBotState(_conversationState, turnContext, false, cancellationToken).ConfigureAwait(false);
                await SaveBotState(_userState, turnContext, false, cancellationToken).ConfigureAwait(false);
                ...
                await turnContext.SendActivityAsync(MainResponseStrings.BotStateReset);
            }
        }

        private async Task SaveBotState(BotState botState, ITurnContext turnContext, bool isRetry, CancellationToken cancellationToken = default(CancellationToken))
        {
            try
            {
                await botState.SaveChangesAsync(turnContext, false, cancellationToken).ConfigureAwait(false);
            }
            catch (DocumentClientException ex)
            {
                try
                {
                    var state = botState.Get(turnContext);
                    string key = null;
                    if (botState is CrewConversationState conversationState)
                    {
                        key = conversationState.GetKey(turnContext);
                    }
                    else if (botState is CrewUserState userState)
                    {
                        key = userState.GetKey(turnContext);
                    }
                    _telemetryClient.TrackEvent("Failed to save bot state.", new Dictionary<string, string>
                    {
                        ...,
                        ["botStateType"] = botState.GetType().FullName,
                        ["isRetry"] = isRetry.ToString(),
                        ["exception"] = ex.ToString(),
                        ["state"] = state?.ToString(),
                        ["key"] = key
                    });
                }
                catch (Exception logException)
                {
                    // After logging, suppress exceptions in trying to log additional context and details about original error
                    _telemetryClient.TrackException(logException, new Dictionary<string, string>
                    {
                        ...,
                        ["botStateType"] = botState.GetType().FullName,
                        ["isRetry"] = isRetry.ToString()
                    });
                }
                if (isRetry)
                {
                    throw;
                }
                await SaveBotState(botState, turnContext, true, cancellationToken).ConfigureAwait(false);
            }
        }
    }

The purpose here is collecting more diagnostic data related to the intermittent error when saving. In App Insights I am not seeing the “Failed to save bot state.” event because I end up in the second catch-block of the SaveBotState method.

  1. Receive intermittent DocumentClientException when saving. (sorry, no idea how to reproduce - hoping for ideas and this is why I’m trying to collect more diagnostic data)

  2. Observe in logs that instead of getting additional diagnostic data, I get NullReferenceException from line var state = botState.Get(turnContext); in the catch block of DialogBot.SaveBotState

Expected behavior

  1. A DocumentClientException with message Message: {"Errors":["The request payload is invalid. Ensure to provide a valid request payload."]} coming from a BotState should provide at least some indication as to what is invalid in the state.

  2. (Assuming my hypothesis is correct for the second part) Calling Get should always produce the same state that would be saved by SaveChangesAsync, regardless if the OOTB BotState classes are inherited from or not.

Screenshots

N/A

Additional context

I am not necessarily expecting a conclusive answer to the first part, but any tips/pointers would be appreciated. For the second part, I believe I pin-pointed a clear bug in the SDK implementation that I certainly would expect to be conclusively fixed. (I would PR…but I haven’t the time to invest in learning this codebase enough to feel confident in any changes)

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
bojingocommented, Apr 13, 2020

Unfortunately, I don’t have the bandwidth to investigate the first problem further. I updated my diagnostic code to work around the second problem (by pulling from TurnState directly in the correct way instead of using the Get method). I haven’t seen the intermittent issue in some time, however…

I peeked into the PR and it looks to be spot-on for addressing the second problem. No objections from me if this issue closes after that PR merges since that was the better understood and actionable problem, anyway.

0reactions
bojingocommented, Aug 11, 2020

@praveenck06 - I continue to intermittently get the error and have since updated to the absolute latest version of Bot Builder libraries.

It sounds like perhaps you made some progress understanding the cause, or at least how to reproduce? Perhaps you can open a new issue detailing your findings? I haven’t been able to reproduce reliably, but if you link it here (just so I can find it) I’d love to jump in on that new issue and at least help explore further / corroborate your findings if you can provide a bit more details. I doubt this closed issue will get much attention…

Read more comments on GitHub >

github_iconTop Results From Across the Web

Why is subclassing not allowed for many of the SWT ...
Override the method to a no-op and effectively make extending legal. I guess to leave this option open is ultimately the reason that...
Read more >
Handle user interruptions - Bot Service
Implement the CancelAndHelpDialog class to handle user interruptions. The cancelable dialogs, BookingDialog and DateResolverDialog derive from ...
Read more >
Subclasses that add fields to classes that override "equals" ...
This rule raises an issue when a subclass of a class that overrides Object.equals introduces new fields but does not also override the...
Read more >
Specifying Failure and Progress Conditions in a Behavior-Based ...
(Extended Abstract) ... specifying an interesting class of such monitoring processes. ... conditions which are added to the robot internal state. These.
Read more >
Java Method/Constructor in Class Cannot be Applied to ...
Whenever a method invocation doesn't match the corresponding method signature, the method X in class Y cannot be applied to given types error...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found