1. Field of the Invention
This invention relates generally to voice command platforms and voice browsers (e.g., a VoiceXML browser) and voice applications that execute on a voice browser. Specifically, the invention relates to a method of handling the situation of overlapping grammar elements between the browser and the application. This invention has two aspects: one part involves weighting application level grammar relative to browser level grammar, the other part involves specifying which elements must be handled by the browser. The method enhances the user experience by increasing the likelihood of correctly interpreting the speech input from the user and by increasing the likelihood of executing the event that the user intended.
2. Description of Related Art
A voice command platform is a computer-implemented system that provides an interface between speech communication with a user and voice command applications. Generally, a person can call the voice command platform and by speaking commands, can browse through voice command applications and menu items within the voice command application. The voice command platform allows the user to access and interact with information maintained by the voice command applications. The voice command platform includes a browser in order for the voice command platform to execute the logic defined by the voice command application. The browser includes an interpreter which functions to interpret the logic (such as in VoiceXML documents) so as to allow the voice command platform to effectively communicate with a user through speech.
The voice command application can be written or rendered in any of a variety of computer languages. One such language is VoiceXML (or simply “VXML”). VoiceXML is an XML-based markup language defined through the W3C consortium. VoiceXML is used to create voice user interfaces. VXML is a tag-based language similar to Hyper Text Markup Language (HTML) that underlies most Internet web pages. Other analogous languages, such as SpeechML, VoxML, or SALT (Speech Application Language Tags), for instance, are available as well.
An application written in VoiceXML can be accessed through a VoiceXML interpreter otherwise known as a voice browser. A VoiceXML application uses speech recognition grammars, recorded audio files, plain text (which is read by the browser's text-to-speech engines) and VoiceXML documents to engage users in a speech-based interaction. The grammar determines which user utterances the speech recognition engine will recognize at any given point in the application as being acceptable. Grammar is typically specified at the application level. The voice browser also has browser-level grammar. The grammar, at both browser and application levels, consists of a set of individual grammar elements. The global grammar elements in the voice browser are assigned to specific tasks, which are handled by event handling routines consistently across all applications. Examples of global grammar elements include “pause”, “help” and “exit”. These elements may exist at the application level as well as at the browser/global level.
When user utterances are recognized by the browser as certain grammar elements like “pause” or “bookmark”, an event handler is triggered, e.g. when the user says “pause”, the application pauses. Depending on the grammar element and the application state, the event will be handled by either the voice browser or the application. For example, having the browser handle “exit” will result in a consistent experience across all applications. On the other hand, it makes sense for the individual applications to handle requests for help with context sensitive help.
In order to avoid presenting the end user with an out of grammar error, it is necessary to allow the voice browser to take over with a browser-level event handler if the events are not handled at the application level. For example, the user may say “help”, and there may be some applications in which the application developer did not support a “help” command. The dialogs below show the difference between handling situations like this with an error scenario versus allowing the browser to intervene. The first dialog shows an example of a “help” request, where the application does not support “help”:
System: Say the destination city followed by the name of the state.
User: Help
System: I'm sorry. I didn't understand that. Please say the destination city followed by the name of the state.
The following dialog shows how the “help” request could be handled at the browser level, in the situation where the application does not support a “help” request:
System: Say the destination city followed by the name of the state
User: Help
System: There is no help information available for this situation. If you would like help using <name of browser>, just hang on. Otherwise, say “back” to be returned to Flight Tracker.
Furthermore, there may be situations where the acceptable grammar in an application may include utterances that sound like global level grammar elements, and situations where both the application level and the browser level support the same grammar element, like “help” or “bookmark”. Thus, two types of conflicts can occur between the application and browser level grammar elements. One is a recognition conflict, in which case the speech recognition engine has to determine whether the user's utterance matches an element of the global level grammar or the application level grammar. The other type of conflict is where the user's utterance is clearly recognized (e.g., “help”), but a match exists between the utterance and both the global level grammar and the application level grammar. In this situation, the question is whether the utterance should be handled (processed by a event handler) at the global (browser) level or at the application level.
In the latter type of conflict, whether the browser or the application should handle a resulting event, for which the utterance may be acceptable grammar at the application level and also at the browser level, varies depending on the particular grammar element in question. For example, having the browser handle “exit” in all instances will result in a more consistent experience across all the applications. On the other hand, it makes sense for the individual applications to handle requests for “help”, since they are usually invoked with context sensitive information that the application is more suited to use in responding to the request.
This problem of conflicts between application and browser-level grammar is magnified by the fact that one of the more popular applications for a VoiceXML compliant browser is a voice portal. A voice portal is typically an application or predefined set of applications residing on a common voice command platform that allows callers to retrieve information such as Email, stock quotes, traffic reports, and weather reports. The voice portal and the other third party applications may well coexist on the same voice browser, but the global grammar requirements may vary among the applications. For example, the browser may support bookmarking but an application like voice-activated call centers or voice-activated dialing applications may not wish to do so.
In a primary aspect, the present invention deals with the recognition type of conflict described previously, that is, situations where the user provides speech input in response to a prompt in the application in which the speech input which may resemble somewhat a global grammar element as well as an application-level grammar element. For example, the speech input may somewhat resemble “bookmark” or “help”. The invention makes use of the observation that given the particular context or state of the application when the input is provided, the likelihood of the user's intent to invoke the associated browser-level event handling routine can be ascertained or at least estimated. The invention provides a weighting feature by which the global level grammar elements are weighted relative to the application level grammar. The weighting does two things: (a) it helps determine whether the utterance is recognized by the speech recognition engine and (b) it influences the confidence level for the recognition of the utterance. Application developers programmably weight the global level grammar relative to the application level grammar. Such weighting can change depending on the state of the application.
In a related aspect of the present invention, a solution is provided to the situation where a voice browser and the applications running on that browser have overlapping grammar (e.g. both support “help”). The solution is described in the form of a default/override table. The default/override table dictates whether the browser level grammar or the application level grammar should be used to respond to the user's utterance.