1. Field of the Invention
The present invention relates to the field of testing voice-enabled applications and, more particularly, to utilizing a session file store to automatically respond to prompts for input during system development and testing.
2. Description of the Related Art
Performing rapid and accurate speech processing tasks can require highly specialized, sophisticated, and resource robust hardware and software, especially when real-time or near real-time processing is desired and/or when speech processing tasks are performed for a diverse population having different speech characteristics. For example, in performing speech recognition tasks, received speech utterances have to be parsed into processable segments, each utterance segment has to be converted into a symbolic representation, these symbolic representations compared against valid words, and these words processed according to grammar rules and/or utterance contexts, where the resulting textual results can then be deciphered to determine what programmatic actions are to be taken responsive to the received speech input. Throughout this process, speaker-dependent characteristics, such as idiom, speaker language, accent, timber, and the like, can be determined and suitable adjustment can be made to more accurately perform speech processing tasks.
To efficiently perform these tasks within a distributed environment, a common practice is to establish various speech processing engines, like automatic speech recognition engines and text-to-speech conversion engines. Often these speech processing engines will be arranged as a cluster of speech processing engines that handle a high volume of speech processing tasks, each cluster consisting of numerous approximately identical engines where the processing load is balanced across the approximately equivalent engines of a cluster to assure each received task can be handled within acceptable performance constraints.
When voice-enabled applications require a speech processing task to be performed, the task is conveyed to the appropriate cluster and a result is returned to the requesting voice-enabled application. To efficiently utilize speech processing resources and to load-balance received tasks among various engines in a cluster, speech processing tasks can be handled in discrete and independent segments, called dialogue turns. Notably, different turn-based subtasks can be processed in parallel or in series by different ones of the speech engines of the cluster. Results of the subtasks can be combined into a task result, which is conveyed to the requesting voice-enabled application.
While the structure described previously can be highly efficient for handling a significant load for a large number of voice-enabled applications using a minimal amount of speech processing resources, it can be extremely difficult to test and develop programs for this complex and interdependent environment. Conventional approaches for testing voice-enabled applications deployed in the aforementioned environment all have shortcomings.
One conventional approach is to deploy code to be tested within a production environment or a non-production equivalent of the production environment and to have a tester interact with the environment as if he/she were a typical user. For example, assume a typical user of a voice-enabled application is a telephone user and assume that the deployed code being tested is for a platform service that the voice-enabled platform uses. A tester can dial a telephone number to be connected with the environment that is being used to test the deployed code. The tester can then respond to each prompt and can listen to the speech output. If errors occur, the tester can examine an associated log file, modify code, deploy the code, and repeat the process. While this process provides a realistic means to test code deployed in a highly interdependent environment, such as platform service code, the process is tedious, time consuming, and costly.
Another conventional approach is to utilize a simulator in which platform services execute locally. In the simulator, a local version of speech service engines, a voice-enabled browser, and a voice-enabled application can execute. A tester interacting with the simulator can provide speech input through a microphone and can hear generated speech output through an external speaker. These interactions, however, can be disturbing to others working in a shared work environment. The interactions can also be inaccurate in noisy environments, where recognition accuracy can be poor. Additionally, a tester who has particular speech characteristics can effectively and often inadvertently train the speech service engines to operate in a speaker dependent fashion. It can be difficult, for example, for a tester to test speech recognition paths other than those default paths corresponding to the tester's speech characteristics, which can include prosody, idiom, and grammar.
One technique to overcome problems with testing based upon speech input, is to permit a tester to provide text based input that simulates speech input. For example, when testing a speech recognition engine, a tester can provide textual input providing one or more textual speech converted results and associated confidence scores. These textual results and scores can be used as a testing substitute for speech-to-text results otherwise generated by speech-to-text converting a speech utterance. The use of textual input to test speech recognition code that normally requires audio input can be highly beneficial in many situations as precisely manipulating confidence scores, n-best lists, and program code pathways can be extremely difficult using audio input. That is, the level of precision and control added through the use of text based input during testing can expedite and enhance the thoroughness of the testing process.
Regardless of which of the above conventional testing methodologies are used, a tester typically is required to repetitively respond to the same prompts over and over again. For example, when testing a deep dialog branch for a session, a tester may have to respond to numerous dialog prompts (via phone input, via spoken simulator input, or text based similar input) before reaching a dialog branch that needs to be tested. Continuously responding to prompts in order to reach a point of interest in code can be very unproductive and aggravating. Additionally, a bored tester lulled by the necessity of repetitively responding to the same initial prompts can make input mistakes, further frustrating testers and exasperating an already tedious process.