Bot will only speak when spoken to
See original GitHub issueScreenshots
Version
Browser version: meta name=“botframework-directlinespeech:version” content=“4.8.0” meta name=“botframework-webchat:bundle:variant” content=“full” meta name=“botframework-webchat:bundle:version” content=“4.8.0” meta name=“botframework-webchat:core:version” content=“4.8.0” meta name=“botframework-webchat:ui:version” content=“4.8.0”
Using SpeechService added via:
(async function() {
const token = '{{ dl_token }}'
// Create the ponyfill factory function, which can be called to create a concrete implementation of the ponyfill.
const webSpeechPonyfillFactory = await window.WebChat.createCognitiveServicesSpeechServicesPonyfillFactory({
textNormalization: 'lexical',
authorizationToken: '{{ sp_token }}',
region: '{{ region }}'
});
// Pass a Web Speech ponyfill factory to renderWebChat.
// You can also use your own speech engine given it is compliant to W3C Web Speech API: https://w3c.github.io/speech-api/.
// For implementor, look at createBrowserWebSpeechPonyfill.js for details.
window.WebChat.renderWebChat(
{
directLine: window.WebChat.createDirectLine({ token }),
webSpeechPonyfillFactory,
userID: 'prt-{{ user_id }}',
username: 'Student',
locale: 'en-US',
},
document.getElementById('webchat')
);
document.querySelector('#webchat > *').focus();
})().catch(err => console.error(err));
</script>
Tokens etc. are generated in a Django app and added via a template hence the {{ variable }} statements.
The system uses both LUIS and Speech Services.
Bot is written in C# and all outputs have a spoken component.
Describe the bug
Bot will only respond with speech if the input is spoken, even if the user speaks first (#2474), subsequent input must be audio for the bot to respond with audio. Written input, or input from a card, only results in text output, even if the user has spoken earlier in the conversation.
Steps to reproduce
-
Create a speech enabled webchat channel following https://github.com/microsoft/BotFramework-WebChat/tree/master/samples/03.speech/c.cognitive-speech-services-with-lexical-result
-
Create a bot and generate all Messages using:
protected static IActivity Speak(string message, string textToShow = null, string language = "en-US", string voice = "JessaRUS") { var activity = MessageFactory.Text(textToShow ?? message); string body = @"<speak version='1.0' xmlns='https://www.w3.org/2001/10/synthesis' xml:lang='en-US'>" + $"<voice name='Microsoft Server Speech Text to Speech Voice ({language}, {voice})'>" + $"{message}" + "</voice></speak>"; activity.Speak = body; return activity; }
-
Run bot
-
Type in input to bot, notice the output is not spoken
-
Speak to bot, notice the output is spoken
Expected behavior
Output of bot is spoken when Speak property of body is set, regardless of mode of input.
Additional context
I have found #2474 but the bot still will only speak after spoken input even if I get the user to speak first. ie. I ask them to press the button and talk then the next response is spoken but if the input after that is written then the output is not spoken.
[Bug]
Issue Analytics
- State:
- Created 4 years ago
- Comments:9
Top GitHub Comments
This is default behavior. I’ve managed to trigger the bot to speak according to a scenario through the Microsoft Healthcare Bot platform.
I have a feeling you’ll need to write a dialog for the bot in C#. Can you link your entire Bot class in your C# project?
@jbgh2, as you already know from the above conversation, the user must have clicked and provided speech input before the bot’s text response is spoken back to the user. This requirement extends from the browser and, unfortunately, there is no work-around at this time.
As you are also aware, future development (issue #2211) looks to mitigate this browser requirement with respect to Web Chat. But, at this time there is no specific ETA. Development has been pushed out to R9, however you should consider this as “subject to change”.
With regards to the “WEB_CHAT” speech actions you reference further up (for example, ‘START_SPEAKING’), these provide a means of responding to specific actions when the user has provided speech input. For example, you may need some page event to fire when the user starts speaking or you want to send an event back to the bot when the user has stopped speaking. In and of themselves, these speech actions do not start or stop the browser speaking any user or bot text.
As you have already commented on the above linked issue, I am going to close this as answered.