The discussion below is merely provided for general background information and is not intended to be used as an aid in determining the scope of the claimed subject matter.
Small computing devices such as personal digital assistants (PDA), devices and portable phones are used with ever increasing frequency by people in their day-to-day activities. With the increase in processing power now available for microprocessors used to run these devices, the functionality of these devices is increasing, and in some cases, merging. For instance, many portable phones now can be used to access and browse the Internet as well as can be used to store personal information such as addresses, phone numbers and the like.
In view that these computing devices are being used with increasing frequency, it is therefore necessary to provide an easy interface for the user to enter information into the computing device. Unfortunately, due to the desire to keep these devices as small as possible in order that they are easily carried, conventional keyboards having all the letters of the alphabet as isolated buttons are usually not possible due to the limited surface area available on the housings of the computing devices. Even beyond the example of small computing devices, there is interest in providing a more convenient interface for all types of computing devices.
To address this problem, there has been increased interest and adoption of using voice or speech to access information, whether locally on the computing device, over a local network, or over a wide area network such as the Internet. With speech recognition, a dialog interaction is generally conducted between the user and the computing device. The user receives information typically audibly and/or visually, while responding audibly to prompts or issuing commands. However, it is often desirable to ascertain the performance of the application during development or after it has been deployed. In particular, it is desired to ascertain usage and/or success rates of users with the application from logged data. With such information, the developer may be able to “tune” (i.e. make adjustments) to the application in order to better meet the needs of the users of the applications. For example, it may be helpful to identify portions of the dialog between the application and the users where problems are most likely to be encountered. In this manner, those portions of the dialog can be adjusted to alleviate confusion.
Nevertheless, determining dialog problems from the log data of deployed applications (e.g. speech and DTMF) is difficult. Dialog problems are essentially user experience problems with the flow of the interaction. They typically result in user frustration, followed by a hang-up or request for operator assistance. In addition, dialog problems are costly to the entity deploying the application in terms of customer ill-will and as well as support expenses.
While the symptoms of dialog problems are fairly clear (low task completion rates and increases in hang-ups or other cancellations), the causes of such problems can be extremely hard to find. Typical dialog problems tend to be a result of mismatches between system and user understanding of the task at hand. They commonly arise from lower level application problems such as prompts that are confusing, or paths that are mistakenly taken (by system error or user misunderstanding).
A large volume of session data is typically required to conduct a diagnosis, yet the large volume of session data means that manual analysis of such data is a long and tedious process. For instance, lengthy stretches of dialog are generally required in order for the full picture of confusion to surface, which must be generalized across users. Furthermore, pinpointing the source location of the problem (the dialog state where the confusion begins) is difficult: for any given hang-up or other user cancellation, because the source of the problem may be several turns prior to the cancellation. In addition, speech applications tend to differ so broadly in their user interaction model that implementations of automated analysis are generally application-specific and limited in extensibility. Lastly, for speech recognition applications, the imperfection of speech recognizers means that a true analysis of user behavior must generally be founded on manual transcriptions of the user input—a secondary and typically costly process.