Bot State failures when extending with a sub class
See original GitHub issueVersion
4.5.3
Describe the bug
This is a 2-parter:
-
I am getting a
DocumentClientException
(I am using Cosmos DB for Bot State):Message: {"Errors":["The request payload is invalid. Ensure to provide a valid request payload."]}
when callingBotState.SaveChangesAsync
at the end ofIBot.OnTurnAsync
. This seems to be happening for theConversationState
rather thanUserState
. I am getting no indication as to what might be wrong with the payload. -
In order to try to diagnose this scenario, I added a bunch of additional diagnostics and logging to my
IBot
implementation, along with a retry in case it is a transient error (it is not). However, theBotState.Get
method is throwing aNullReferenceException
:Value cannot be null. Parameter name: o
preventing me from inspecting and knowing what might be wrong with the bot state.
This second issue seems to be in the implementation of BotState: https://github.com/microsoft/botbuilder-dotnet/blob/master/libraries/Microsoft.Bot.Builder/BotState.cs
Specifically, the Get
method uses the following code to get the cached state:
var stateKey = this.GetType().Name;
var cachedState = turnContext.TurnState.Get<object>(stateKey);
Meanwhile, the SaveChangesAsync method uses:
var cachedState = turnContext.TurnState.Get<CachedBotState>(_contextServiceKey);
I assume _contextServiceKey
and this.GetType().Name
produce different key values in the case where I am extending ConversationState
or UserState
?
To Reproduce
Steps to reproduce the behavior:
- Extend
ConversationState
. Below is what I have, with some noise removed:
public class CrewConversationState : ConversationState
{
...
private readonly IBotTelemetryClient _botTelemetryClient;
public CrewConversationState(IStorage storage, ..., IBotTelemetryClient botTelemetryClient)
: base(storage)
{
...
_botTelemetryClient = botTelemetryClient;
}
// Give me public access to storage keys
public string GetKey(ITurnContext turnContext)
{
return GetStorageKey(turnContext);
}
protected override string GetStorageKey(ITurnContext turnContext)
{
... // Custom implementation
}
}
- Update the DialogBot that came from the Virtual Assistant template, again with some domain-specific noise removed):
public class DialogBot<T> : IBot
where T : Dialog
{
private readonly Dialog _dialog;
private readonly BotState _conversationState;
private readonly BotState _userState;
...
private readonly IBotTelemetryClient _telemetryClient;
public DialogBot(IServiceProvider serviceProvider, T dialog)
{
_dialog = dialog;
_conversationState = serviceProvider.GetService<ConversationState>();
_userState = serviceProvider.GetService<UserState>();
...
_telemetryClient = serviceProvider.GetService<IBotTelemetryClient>();
}
public async Task OnTurnAsync(ITurnContext turnContext, CancellationToken cancellationToken = default(CancellationToken))
{
// Client notifying this bot took to long to respond (timed out)
if (turnContext.Activity.Code == EndOfConversationCodes.BotTimedOut)
{
_telemetryClient.TrackTrace($"Timeout in {turnContext.Activity.ChannelId} channel: Bot took too long to respond.", Severity.Information, new Dictionary<string, string>
{
...
});
return;
}
await _dialog.RunAsync(turnContext, _conversationState.CreateProperty<DialogState>(nameof(DialogState)), cancellationToken);
// Save any state changes that might have occured during the turn.
try
{
await SaveBotState(_conversationState, turnContext, false, cancellationToken).ConfigureAwait(false);
await SaveBotState(_userState, turnContext, false, cancellationToken).ConfigureAwait(false);
}
catch (DocumentClientException ex)
{
_telemetryClient.TrackException(ex);
_telemetryClient.TrackEvent("Clearing all bot state, conversation and user, associated with current context", new Dictionary<string, string>
{
...
});
await _conversationState.ClearStateAsync(turnContext, cancellationToken);
await _userState.ClearStateAsync(turnContext, cancellationToken);
await SaveBotState(_conversationState, turnContext, false, cancellationToken).ConfigureAwait(false);
await SaveBotState(_userState, turnContext, false, cancellationToken).ConfigureAwait(false);
...
await turnContext.SendActivityAsync(MainResponseStrings.BotStateReset);
}
}
private async Task SaveBotState(BotState botState, ITurnContext turnContext, bool isRetry, CancellationToken cancellationToken = default(CancellationToken))
{
try
{
await botState.SaveChangesAsync(turnContext, false, cancellationToken).ConfigureAwait(false);
}
catch (DocumentClientException ex)
{
try
{
var state = botState.Get(turnContext);
string key = null;
if (botState is CrewConversationState conversationState)
{
key = conversationState.GetKey(turnContext);
}
else if (botState is CrewUserState userState)
{
key = userState.GetKey(turnContext);
}
_telemetryClient.TrackEvent("Failed to save bot state.", new Dictionary<string, string>
{
...,
["botStateType"] = botState.GetType().FullName,
["isRetry"] = isRetry.ToString(),
["exception"] = ex.ToString(),
["state"] = state?.ToString(),
["key"] = key
});
}
catch (Exception logException)
{
// After logging, suppress exceptions in trying to log additional context and details about original error
_telemetryClient.TrackException(logException, new Dictionary<string, string>
{
...,
["botStateType"] = botState.GetType().FullName,
["isRetry"] = isRetry.ToString()
});
}
if (isRetry)
{
throw;
}
await SaveBotState(botState, turnContext, true, cancellationToken).ConfigureAwait(false);
}
}
}
The purpose here is collecting more diagnostic data related to the intermittent error when saving. In App Insights I am not seeing the “Failed to save bot state.” event because I end up in the second catch-block of the SaveBotState
method.
-
Receive intermittent
DocumentClientException
when saving. (sorry, no idea how to reproduce - hoping for ideas and this is why I’m trying to collect more diagnostic data) -
Observe in logs that instead of getting additional diagnostic data, I get
NullReferenceException
from linevar state = botState.Get(turnContext);
in the catch block ofDialogBot.SaveBotState
Expected behavior
-
A
DocumentClientException
with messageMessage: {"Errors":["The request payload is invalid. Ensure to provide a valid request payload."]}
coming from a BotState should provide at least some indication as to what is invalid in the state. -
(Assuming my hypothesis is correct for the second part) Calling
Get
should always produce the same state that would be saved bySaveChangesAsync
, regardless if the OOTB BotState classes are inherited from or not.
Screenshots
N/A
Additional context
I am not necessarily expecting a conclusive answer to the first part, but any tips/pointers would be appreciated. For the second part, I believe I pin-pointed a clear bug in the SDK implementation that I certainly would expect to be conclusively fixed. (I would PR…but I haven’t the time to invest in learning this codebase enough to feel confident in any changes)
Issue Analytics
- State:
- Created 3 years ago
- Comments:7 (3 by maintainers)
Top GitHub Comments
Unfortunately, I don’t have the bandwidth to investigate the first problem further. I updated my diagnostic code to work around the second problem (by pulling from TurnState directly in the correct way instead of using the Get method). I haven’t seen the intermittent issue in some time, however…
I peeked into the PR and it looks to be spot-on for addressing the second problem. No objections from me if this issue closes after that PR merges since that was the better understood and actionable problem, anyway.
@praveenck06 - I continue to intermittently get the error and have since updated to the absolute latest version of Bot Builder libraries.
It sounds like perhaps you made some progress understanding the cause, or at least how to reproduce? Perhaps you can open a new issue detailing your findings? I haven’t been able to reproduce reliably, but if you link it here (just so I can find it) I’d love to jump in on that new issue and at least help explore further / corroborate your findings if you can provide a bit more details. I doubt this closed issue will get much attention…