The present invention relates to computer software. The invention relates more specifically to approaches for computing and displaying statistical information about the performance of interactive speech applications.
Computer-based interactive speech applications provide automated interactive communications. For example, a computer-based interactive speech application may be used in a telephone system to automatically answer an incoming call and engage in a dialogue with the caller in order to route the call, provide requested information, or process a transaction. Using speech recognition technology, the application is able to convert a caller""s speech into a textual representation and thereby understand what the caller is saying. These applications are also sometimes categorized under the general heading of interactive voice response (IVR) applications. Where they involve the use of speech recognition technology, these applications are defined here under the more narrow term, xe2x80x9cinteractive speech applicationsxe2x80x9d.
In the past, developing interactive voice response applications that use speech recognition technology has been difficult for the enterprises that implement these applications, their programmers, and others. The software development tools and application testing tools available for use in the development process have been less than satisfactory. One recent significant improvement in the development process involves the use of re-usable software components, commercially known as DialogModules(trademark), that a developer may interconnect to produce effective interactive speech applications quickly. This modular approach is described in co-pending U.S. patent application Ser. No. 09/081,719, filed May 6, 1998, entitled xe2x80x9cSystem and Method for Developing Interactive Speech Applications.xe2x80x9d
Although the modular approach represents a significant advance in the development process, there is still a need for an effective way to determine whether a completed interactive speech application is working effectively. Generally, a developer or programmer prepares an interactive speech application by hand-writing source code, assembling pre-defined objects, or joining modular components using a system such as DialogModules(trademark). The developer compiles the program, installs it on a test system, and verifies that it operates correctly in response to test telephone calls. The program may be debugged and rewritten over a period of time. The completed application is then launched and used on a xe2x80x9clivexe2x80x9d basis.
Even though a completed application operates correctly, meaning that there are no errors in the program code, it may not perform in an optimal manner in a live environment. Detecting performance problems is difficult. Performance problems include a variety of issues or defects, such as the inability of repeated callers to understand a particular prompt or option, callers becoming xe2x80x9clostxe2x80x9d in the logical flow of the application, etc. In the past, developers have received feedback on such problems in a manual way, such as by callers calling an institution to complain about its voice response system.
Thus, there is a need for an automated way to provide feedback on the usability or effectiveness of a completed application.
In addition, there is a need for tools that can be used to identify potential problems with particular components of an application, such as vocabulary, prompts, and call flow.
There is also a need for a mechanism that can analyze performance of an application, compute statistical information that reveals the characteristics of the performance, and provide reports containing information that can be used to xe2x80x9ctunexe2x80x9d or improve performance of the application.
There is also a need for a mechanism that can provide analytical statistical information about a call that involves use of an interactive speech application.
The foregoing needs, and other needs and objects that will become apparent from the following description, are achieved by the present invention, which comprises, in one embodiment, a method for generating information useful in improving performance of an interactive speech application program, the method comprising the steps of: receiving, from an event log that is generated by the interactive speech application during a call from a caller, one or more event values associated with one or more calls, wherein each of the event values describes a task carried out by the interactive speech application during the call and in response to interaction with the caller; generating a statistical summary of the performance of the interactive speech application based on the event values.
One feature of this aspect is modifying one or more parameters of the interactive speech application, to improve its performance, based using the statistical summary.
Another feature of the aspect is generating a report describing transaction results for each module of the interactive speech application.
In another feature, generating the statistical summary further comprises generating a report of results of attempts to collect primary module data from the caller.
In another feature, generating the statistical summary further comprises generating a report describing recognition context results.
According to another feature, generating the statistical summary further comprises generating a report describing vocabulary results.
According to still another feature, generating the statistical summary further comprises generating a report describing context statistics.
In another feature, generating a statistical summary further comprises: reading a current event value from the event log; determining an identity of a call associated with the current event value; processing call information values associated with the current event value to produce statistical data associated with each call; iteratively repeating the reading, determining, and processing steps until all the events in the event log have been processed; creating the statistical summary based on the statistical data.
In another feature, processing call information values further comprises creating and storing call initiation data for reporting on call initiation of each call when the current event is a Start of Call event.
According to another feature, processing call information values further comprises creating and storing call duration data for reporting on call duration for each call when the current event is an End of Call event.
In another feature, processing call information values further comprises determining whether any module data exists for a current module associated with a Start of Module event when the current event is the Start of Module event.
In another feature, processing call information values further comprises determining recognition context data when the current event is a Start of Recognition event.
According to another feature, processing call information values further comprises updating timing information associated with the current recognition context when a Beginning of Speech keyword is present and when the current event is a Start of Utterance event.
In another feature, processing call information values further comprises updating timing information associated with the current recognition context when the current event is a Recognition Timing event and the Start of Utterance event does not contain timing information.
In another feature, processing call information values further comprises creating and storing results of attempts to collect primary module data associated with the current module when the current event is an End of Module event.
According to yet another feature, processing call information values further comprises creating and storing recognition context results associated with the current recognition context when the current event is an End of Recognition event.
According to another feature, creating and storing call initiation data further comprises: incrementing a number of calls value represented in the event log; and incrementing a number of executions value and a number of disconnects value when the Start of Call event is encountered while processing the current module.
In another feature, creating and storing call duration data for reporting on call duration for each call further comprises: determining a call duration value represented in the event log; and incrementing a number of executions value and a number of disconnects value when the End of Call event is encountered while processing the current module.
In another feature, determining whether any module data exists for a current module further comprises: creating and storing a new module data element when no module data exists for the current module; and initializing the new module data element with current module information.
In another feature, recognition context data further comprises: determining a current recognition context value based on the event log; initializing the recognition context data using the current recognition context value; updating the current module data based on a previous recognition context value when the current recognition context has a status value that is not Spelling, Confirmation or Start of Recognition; and setting the status value of the current recognition context to Start of Recognition.
In another feature, updating the current module data based on a previous recognition context value further comprises incrementing a value representing a number of acceptances associated with attempts to collect primary module data from the caller when the previous recognition context is Accepted.
According to another feature, updating the current module data based on a previous recognition context value further comprises incrementing a value representing a number of unknowns associated with attempts to collect primary module data from the caller when the previous recognition context is Confirmation.
In another feature, updating the current module data based on a previous recognition context value further comprises incrementing a value representing a number of rejections associated with attempts to collect primary module data from the caller when the previous recognition context is Rejected.
In another feature, updating the current module data based on a previous recognition context value further comprises incrementing a value representing a number of negative caller responses to confirmations associated with attempts to collect primary module data from the caller when the previous recognition context is Confirmed False.
In another feature, updating the current module data based on a previous recognition context value further comprises incrementing the number of affirmative caller responses to confirmations associated with attempts to collect primary module data from the caller when the status of the previous recognition context is Confirmed True.
In another feature, updating the current module data based on a previous recognition context value further comprises incrementing the number of collections of caller responses.
In another feature, updating timing information further comprises: incrementing a value representing a number of utterances associated with the current recognition context; determining a time value for a beginning of speech based on the event log; determining a response time value based on the event log; determining whether the caller barged-in; and storing, in association with the current recognition context, the value representing the number of utterances, the time value, the response time value, and a value representing whether the caller barged in.
In another feature, updating the timing information further comprises: incrementing a value representing a number of utterances associated with the current recognition context; determining a time value for the beginning of speech based on the event log; determining a response time value based on the event log; determining whether the caller barged-in based on the event log; creating and storing a duration data when a keyword indicating speech duration is present in the event log; and storing, in association with the current recognition context, the value representing the number of utterances, the time value, the response time value, a value representing whether the caller barged in, and the duration data.
According to another feature, creating and storing results of attempts to collect primary module data further comprises: if a vocabulary item was previously stored, then: incrementing the number of times the vocabulary item has occurred; updating a vocabulary information to indicate that spelling was used when the vocabulary item had to be spelled; and updating the vocabulary information to indicate confirmation when the vocabulary item was confirmed.
In another feature, generating recognition context results further comprises storing speech duration data when a keyword indicating speech duration is present in the End of Recognition event.
In another feature, generating recognition context results further comprises the steps of: if the recognition context result is xe2x80x9cokxe2x80x9d, then obtaining a list of vocabulary items and a match-confidence score; and incrementing the number of successful results for the current recognition context.
In another feature, generating recognition context results further comprises the steps of: if the in-vocabulary score and match-confidence score value are both greater than a first pre-determined value, then selectively setting the status of the current recognition context according to information in the log file and the match-confidence score value; selectively updating vocabulary data based on a value of the current recognition context; and incrementing one or more counter values that define actions taken by the caller according to values obtained from the log file.
According to another feature, generating a report of transaction results further comprises, for each module of the interactive speech application, creating and displaying a success rate value based on a sum of a percentage of transactions assumed to be correct and a percentage of transactions that ended in a command divided by the sum of the percentage of transactions assumed to be correct and a percentage of transactions that ended in the command and a percentage of transactions that failed.
In another feature, generating a report of results of attempts to collect primary module data further comprises, for each module of the interactive speech application, creating and displaying the percentage of attempts that were accepted, confirmed true, confirmed false, and rejected.
According to still another feature, generating a report of recognition context results further comprises creating and displaying a percentage of successes for each recognition context of each module, percentage of failures for each recognition context of each module, a percentage of time-outs for each recognition context of each module, a percentage of occurrences when the caller spoke too long for each recognition context of each module, a percentage of stops for each recognition context of each module, a percentage of caller hang-ups for each recognition context of each module, a percentage of aborted operations for each recognition context of each module; and a percentage of errors that occurred for each recognition context of each module.
In another feature, generating a report of attempts to collect primary module data further comprises creating and displaying a percentage of times that the caller had to confirm each answer for each module, a percentage of times that the caller had to use a fallback mechanism for each answer for each module, and a percentage of times that the caller had to disambiguate each answer for each module.
In another feature, generating a report of results of attempts to collect primary module data further comprises creating and displaying an average duration of an utterance for each recognition context for each module, an average response time of a recognizer mechanism for each recognition context for each module, an average duration that the caller waited before speaking for each recognition context for each module, the speech duration of the caller as a percentage of the total time in the recognition context for each module, and the percentage of time that the caller barged-in for each recognition context for each module.
In other aspects, the invention encompasses a computer-readable medium, and a carrier wave configured to carry out the foregoing steps.