Techniques for accomplishing automatic speech recognition (ASR) are well known. Among known ASR techniques are those that use grammars. A grammar is a representation of the language or phrases expected to be used or spoken in a given context. In one sense, then, ASR grammars typically constrain the speech recognizer to a vocabulary that is a subset of the universe of potentially-spoken words; and grammars may include subgrammars. An ASR grammar rule can then be used to represent the set of “phrases” or combinations of words from one or more grammars or subgrammars that may be expected in a given context. “Grammar” may also refer generally to a statistical language model (where a model represents phrases), such as those used in language understanding systems.
Products and services that utilize some form of automatic speech recognition (“ASR”) methodology have been recently introduced commercially. Desirable attributes of complex ASR services that would utilize such ASR technology include high accuracy in recognition; robustness to enable recognition where speakers have differing accents or dialects, and/or in the presence of background noise; ability to handle large vocabularies; and natural language understanding. In order to achieve these attributes for complex ASR services, ASR techniques and engines typically require computer-based systems having significant processing capability in order to achieve the desired speech recognition capability.
In a standard speech recognition/synthesis system, a database of utterances is maintained for administering a predetermined service. In one example of operation, a user may utilize a telecommunication network to communicate utterances to the system. In response to such communication, the utterances are recognized utilizing speech recognition, and processing takes place utilizing the recognized utterances. Thereafter, synthesized speech is outputted in accordance with the processing. In one particular application, a user may verbally communicate a street address to the speech recognition system, and driving directions may be returned utilizing synthesized speech.
In order to facilitate the interaction between the user and a system that is available through the Internet, a specially adapted voice mark-up language (VoiceXML) is employed. VoiceXML allows for the creation of voice dialogs, which are stored on any Web site and referenced by URL just like HTML documents. In use, the user may call a phone number and interact with a VoiceXML application through speech recognition, and (TTS) Text-To-Speech and recorded prompts. To accomplish this, VoiceXML allows a developer to create a script, whereby the user can have a conversation with a script which is stored on the Web site, and executed by a VoiceXML Browser. The user places a call and is connected to a program called a voice browser, or “interpreter”. The voice browser may fetch the user's VoiceXML document at a specified URL. The user may interact with the VoiceXML document using speech recognition as it is interpreted by the VoiceXML Browser. The markup defined in VoiceXML is a specific instance of the Extensible Markup Language (XML), the strategic data definition language for the Internet.
Prior art FIG. 1 illustrates a particular example 10 of use of a conventional voice browser showing the manner in which a user may transition among states. As shown, a main menu 12 may be provided in the form of a plurality of prompts each corresponding to one of many states of the particular voice browser. In one example, a news prompt 14, a weather prompt 16, and a traffic prompt 18 may be provided. A user may begin by verbally selecting the news prompt 14 after which he or she may select the weather prompt 16. At this or any other point, the user may verbalize a “go-back” command which prompts the voice browser to return to a previous state. For example, at the weather prompt 16 the go-back command would return the voice browser to the news prompt 14. Of course, the functionality of the go-back command may vary based on the current state. For example, from the traffic prompt 18 the go-back command may carry the user to a city prompt 20 which in turn may verbalize traffic 22 to the user based on the city that is chosen.
One problem with designing such go-back features is the need to specify to which prompt the voice browser must transition at each state. This requirement can be quite cumbersome during the design phase of a voice application.