1. Field of the Invention
The present invention relates to the field of network diagnostics, and, more particularly, to diagnosing voice application issues.
2. Description of the Related Art
Performing rapid and accurate speech processing tasks can require highly specialized, sophisticated, and resource robust hardware and software, especially when real-time or near real-time processing is desired and/or when speech processing tasks are performed for a diverse population having different speech characteristics. For example, in performing speech recognition tasks, received speech utterances have to be parsed into processable segments, each utterance segment has to be converted into a symbolic representation, these symbolic representations compared against valid words, and these words processed according to grammar rules and/or utterance contexts, where the resulting textual results can then be deciphered to determine what programmatic actions are to be taken responsive to the received speech input. Throughout this process, speaker-dependent characteristics, such as idiom, speaker language, accent, timber, and the like, can be determined and suitable adjustment can be made to more accurately perform speech processing tasks.
To efficiently perform these tasks within a distributed environment, a common practice is to establish various speech processing engines, like automatic speech recognition engines and text-to-speech conversion engines. Often these speech processing engines will be arranged as a cluster of speech processing engines that handle a high volume of speech processing tasks, each cluster consisting of numerous approximately identical engines where the processing load is balanced across the approximately equivalent engines of a cluster to assure each received task can be handled within acceptable performance constraints.
When voice-enabled applications require a speech processing task to be performed, the task is conveyed to the appropriate cluster and a result is returned to the requesting voice-enabled application. To efficiently utilize speech processing resources and to load-balance received tasks among various engines in a cluster, speech processing tasks can be handled in discrete and independent segments, called dialogue turns. Notably, different turn-based subtasks can be processed in parallel or in series by different ones of the speech engines of the cluster. Results of the subtasks can be combined into a task result, which is conveyed to the requesting voice-enabled application.
While the afore described structure can be highly efficient for handling a significant load for a large number of voice-enabled applications using a minimal amount of speech processing resources, it can be extremely difficult to diagnose problems within this complex and interdependent environment. That is, once an application is deployed into an operational environment, debugging problems in the field can consume time, skilled technician resources, and computing resources. Conventional approaches for diagnosing problems all have shortcomings.
One problem diagnosis technique, for example, is to establish component specific logs, each log recording the events that transpire. The highly interactive nature of this distributed, turn-based, operational environment, however, results in huge activity logs. Tracing these logs to diagnose problems is an extremely cumbersome process that requires logs of an application server to be compared with speech processing component logs, network traffic logs, and the like. Also, the process of extensively logging all activities in a comprehensive manner can require significant memory and can consume substantial processing resources, which can affect the run-time performance of the operational system.
When the speech-processing engine is a speech recognition engine, one of the most resource consuming aspects pertaining to event logging relates to speech utterances. That is, to be able to later re-create an operational scenario within a maintenance environment, one conventional approach to is record the audio files or speech utterances for each dialogue turn along with specific settings related to speech-recognizing these turn-based audio files. Audio files are extremely large files, which rapidly consume memory. For instance, a single hour of audio recording can fill a hard drive, which would otherwise be capable of storing months' worth of log information. What is needed is a way to recreate the conditions for diagnostic purposes that does not have the limitations of conventional techniques.