Today, one prevalent example of speech-enabled user interfaces is a voice response system.
Voice response systems generally permit callers to be connected with automated service which may direct the user through a variety of menu options permitting the user to transmit and receive information to and from a computer system. Applications for such voice response systems including banking and other financial applications, and other operations suitable for automated interaction such as utilities, cable services, and telephone companies. In the banking context, users may dial a designated number and navigate through a set of menu choices permitting the user to receive information such as account balances, the status of various transactions, and possibly to perform financial transactions such as transferring money from one account to another.
Generally, in traditional voice response systems a structured menu has been employed. Such systems present the user with a fixed set of menus with each menu having a specific set of choices as to which function the user would like to perform next. Since the total number of functions in such a system is typically too large to recite to a caller within a single level of menu choices, the menu system is typically structured with a menu hierarchy so as to funnel the uses toward a desired destination activity. For example, if a user wished to determine the current interest rate on a savings account, a first menu may comprise "account information," "general information," and "transfer to customer service" options. Once the user selects "account information" in the first menu, a second menu may present options which include: "checking information", "savings information," and "loan information." The user would then select "savings information" from the second menu. A third menu may then offer a choice including: "savings balance," "savings deposits," "savings withdrawals," and "savings rates."
Speech recognition enabled voice response systems allow callers to verbally submit information to a system instead of using a touch-tone keypad. A simplistic approach to utilizing this technology would involve an application accepting an utterance from a caller naming their desired transaction. This would work well if the caller knows all of the transaction phrases the system supports. A transaction phrase is the word or set of words which identifies a specific action the VRU can perform. For example, "checking balance", "IRA rates", and "savings deposits" are typical transaction phrases for a banking application. However, callers are generally not familiar with the set of transactions the VRU supports or the set of phrases and synonym phrases which the application developer chose to describe the transactions in the application.
There is also a simple method of implementing a speech enabled user interface which involves understanding specific words in place of dual tone multi-frequency (DTMF) keys, however this approach lacks sophistication and flexibility.
A more sophisticated approach to defining an application utilizing speech recognition would be to not only support exact transaction phrases spoken by a caller but to also posses the ability to respond to partial phrases and related words. If the caller has ambiguously identified a desired transaction, the system will preferably offer the caller a list of supported transactions related to the caller's utterance. For example, if the system supports the transactions: "checking rates", "mortgage rates", and "checking balance"; if a caller speaks "rates", the system will offer the choices "checking rates", and "mortgage rates".
However, coding such an application employing the systems of the prior art requires an inordinate amount of time and effort. An example is considered in which an application supports twenty transactions and a comparison is made between its implementation as a touch tone application, a simplistic single input speech application, and a more sophisticated speech application. In the touch tone application implementation, the developer must program the actions for each of the twenty transactions. In addition, one top level menu and three or four sub-menus would generally be created allowing the caller to select their transactions.
In the case of a single input speech application, a developer has to code the actions for each of the twenty transactions. In addition, he has to define a phonetic grammar to recognize the twenty different phrases.
In the case of the more sophisticated speech recognition enabled application, to code the desired application using the existing tools, the developer still has to code the actions for each of the twenty transactions, and he must define the phonetic grammar to recognize the twenty different phrases. Further, he must add to the grammar the partial phrases and related words and code actions which comprise prompting with a pertinent offering to direct the caller to a transaction which the VRU supports. For an application supporting twenty transactions with two to three words in each phrase and two to three synonyms there would be approximately one hundred additional actions for a developer to code. Obviously, this preferred, sophisticated speech enabled application is more difficult for a developer to code than the touch tone application or a single input speech application.
Therefore, it is a problem in the art that coding a preferred, sophisticated speech recognition enabled application for a voice response system is excessively difficult and time consuming.