Voice-response systems which employ automatic speech recognition (ASR) technology are becoming increasingly more common in everyday life. ASR is a technology that allows machines to recognize human speech. Applications of voice-response technology include, for example, automated customer service call centers of business enterprises, which respond to a telephone caller's speech, and voice-response systems in automobiles, homes, businesses and entertainment venues.
At the heart of every voice response system is an automatic speech recognizer and a speech application. A “speech application” is a speech-enabled software application, separate from the recognizer, which determines what the system does in response to recognized speech from the recognizer. The speech application receives recognized speech from the recognizer, executes some function or functions based on the speech inputs according to the speech application's internal logic, and generates appropriate output. The speech application further generates various audible prompts to the user, which may be synthesized (machine-generated) speech for example.
The processing logic which speech applications provide is in the form of “dialog flows”; every speech application includes one or more dialog flows. A dialog flow is a set of two or more states in a human-machine dialog (“dialog states”) in some logical relationship to each other, which define how a speaker's speech is processed. A dialog state may be a recognition state, which is a state that includes a prompt to request the speaker to speak, a grammar to recognize what the speaker says, and one or more actions to take based on what was recognized.
Although recent years have been marked by a wide variety of new speech applications, the process and technology for designing and building speech applications has lagged behind. That is, the process of designing and building speech application has, prior to the present invention, been slow, difficult, tedious, time-consuming and prone to errors. In general, the process typically has been as follows.
Initially in the design of a speech application, a (human) voice user interface (VUI) designer writes a functional specification for the speech application. The functional specification is a document, written in a human natural language (e.g., English), that specifies at a high level what the speech application will do. In particular, the functional specification specifies the various dialog flows that will form the speech application, including the required prompts, grammars, processing logic, error handling logic, etc. The VUI designer then provides the functional specification to a (human) speech application developer, who is an expert in writing the software to implement speech applications. The developer then begins to implement the speech application in software, using an appropriate language such as VoiceXML.
A problem with this process, however, is that it is not conducive to a short or efficient design/development process. Typically the VUI designer is not very familiar with speech application software code. As a result, the VUI designer is unable to have meaningful input in the design process after providing the specification to the developer, until the developer has generated a working prototype of the speech application. As a result, any flaws or design issues may not be identified until substantial time and effort has been spent on development of the application. Once a prototype has been created by the developer, the VUI designer may make changes to the functional specification, based on feedback from the developer. This process is often time-consuming and tedious. The developer would then modify the speech application code to implement those changes. This cycle may continue through several iterations, resulting in a long and tedious design/development process. Often the implementation of the speech application will diverge from what the VUI designer intended; however, that divergence may go unnoticed until substantial time and effort has been spent on development. This problem may be exacerbated by the fact that the VUI designer and the application developer may work for different business enterprises (e.g., corporate partners in the design/development of a particular product).
Existing approaches to speech application development include VoiceXML coding in a code editing environment, such as V-Builder 2.0 from Nuance Communications of Menlo Park, Calif., or Windows Notepad. However, only very technically knowledgeable individuals who can write code can create applications or prototypes in such an environment.
Existing approaches also include graphical call flow-oriented development with the ability to drag and drop graphical icons. However, this development approach has been primarily available only within legacy, non-VoiceXML tools and has been limited to creating applications in non-standard languages, rather than in VoiceXML. The only solutions known to provide this approach for VoiceXML applications sharply divide the prototyping process from the full deployment process; as such, once a developer moves into deployment mode, his prototyping options are greatly limited with these solutions.
What is needed, therefore, is a tool which overcomes shortcomings of the prior art, including making the process of designing an developing a speech application simpler, more efficient, less time-consuming and less error-prone.