The present invention generally relates to the field of performance testing of processor bound voice processing engines. More particularly, the present invention relates to detecting underflow conditions to determine the number of voice processing engines capable of being supported by a data processing system.
Messaging systems often provide voice processing capabilities that are utilized to both process and simulate human speech. One example of such a messaging system is the Network Applications Platform (NAP) commercially available from UNISYS Corporation (xe2x80x9cthe NAP systemxe2x80x9d). The NAP is a configuration of hardware and software that provides data and voice processing capabilities through applications running on a host computer system. The NAP, in combination with a network interface unit (NIU), provides the interface between these applications, called network applications, and a telephone network. The NAP is implemented on selected Unisys A Series and ClearPath HMP NX computer systems running the Unisys MCP operating system. Further details of the structure and function of the NAP are provided in the following issued patents and pending applications, all of which are hereby incorporated by reference in their entireties:
U.S. Pat. No. 5,133,004, issued Jul. 21, 1992, entitled xe2x80x9cDigital Computer Platform for Supporting Telephone Network Applicationsxe2x80x9d;
U.S. Pat. No. 5,323,450, issued Jun. 21, 1994, entitled xe2x80x9cTelephone Network Applications Platform for Supporting Facsimile Applicationsxe2x80x9d;
U.S. Pat. No. 5,384,829, issued Jan. 24, 1995, entitled xe2x80x9cDigital Computer Platform for Supporting Telephone Network Applicationsxe2x80x9d;
U.S. Pat. No. 5,493,606, issued Feb. 20, 1996, entitled xe2x80x9cMulti-Lingual Prompt Management System for a Network Applications Platformxe2x80x9d;
U.S. Pat. No. 6,058,166, issued May 2, 2000 entitled xe2x80x9cEnhanced Multi-Lingual Prompt Management in a Voice Messaging System With Support for Speech Recognitionxe2x80x9d;
U.S. patent application Ser. No. 09/161/214, filed Sep. 25, 1998, entitled xe2x80x9cMultiple Node Messaging System Wherein Nodes Have Shared Access To Message Stores Of Other Nodesxe2x80x9d;
U.S. patent application Ser. No. 09/307,014, filed May 7, 1999, entitled xe2x80x9cInter-System Call Transferxe2x80x9d;
U.S. patent application Ser. No. 09/451,077, filed Nov. 30, 1999, entitled xe2x80x9cMethod and Apparatus for Preventing Hung Calls During Protocol Violations in a Voice Messaging Systemxe2x80x9d;
U.S. patent application Ser. No. 09/636,656, filed Aug. 11, 2000, entitled xe2x80x9cNetwork Interface Unit Having an Embedded Services Processorxe2x80x9d; and
U.S. patent application Ser. No. 09/636,677, filed Aug. 11, 2000, entitled xe2x80x9cAdjunct Processing Of Multi-Media Functions In A Universal Messaging Systemxe2x80x9d
In providing voice messaging systems, it is sometimes necessary to provide a scalable system that offers a multitude of voice processing engines, such as Text to Speech (TTS) engines, Automated Speech Recognition (ASR) engines, or Natural Language Understanding (NLU) engines, running concurrently to service the needs of multiple simultaneous human users. However, voice processing engines are processor bound and, thus, only a limited number can run on a particular data processing system with a particular configuration. If the operation of the data processing system is real time, (i.e., it is handing audio to a person instead of saving it, for example, to a file) the processing of voice information (e.g., voice generation or voice recognition) is critical. For example, in the case of TTS processing, if the data processing system does not generate speech in a timely fashion, as judged from the perspective of the human user, a condition occurs called an underflow. To the human user listening to the audio, underflow will sound like breaks in the audio and generally results in an unpleasant listening experience.
Determining the number of voice processing engines that can run on a particular data processing system can be problematic. Such testing normally requires expensive hardware, such as telephony hardware to discern an approximate number by simulating a number of human users connected to the system. This method is expensive and time consuming. Furthermore, using telephony hardware means that one must provide some kind of stimulus to start the system.
In the simplest case, a number of callers dial into the messaging system and simultaneously launch a voice processing engine. This method is difficult to coordinate and synchronize and is not conducive to running multiple tests. Alternatively, one could use a system that automatically dials in to the messaging system and launches a voice processing engine. However, such an automatic system would likely be even worse at detecting underflow conditions than would a human. For example, if the voice processing engine is a TTS engine, the engine inputs text data and outputs speech data in an audio format. The automatic system would analyze the audio and determine if an underflow condition has occurred. Very minor underflows would likely not be detected because it would have only a minor impact on the audio output. Also, the criteria for judging the underflow experience will be subjective rather than an objective measurement of the system""s capabilities.
In view of the above problems, there is a recognized need for a system and method which can objectively discern the number of voice processing engines that can be run simultaneously on a particular data processing system without the installation of expensive hardware. The present invention satisfies this need.
This invention is directed to systems and methods for determining how many voice processing engines (e.g., Text to Speech (TTS) engines, Automated Speech Recognition (ASR) engines, or Natural Language Understanding (NLU) engines), a particular data processing system, or computer system can run without causing underflow conditions.
For example, in the context of TTS processing, a computer system may run one or more TTS engines to provide voice synthesis. Text data is input to a TTS engine, and the engine converts the text formatted data to speech formatted data and outputs that speech data to an audio object, for example, for playing the speech over speakers. If the TTS engine does not perform its conversion quickly enough, there will not be enough speech data for the audio object to process, which is considered an underflow condition. The speech will include breaks in the speech and potentially become unrecognizable. The present invention can be used in this context to determine how many TTS engine instances a particular computer system can run without causing audio underflows.
An embodiment of the present invention includes three components: a test application for configuring test parameters, launching voice processing engines, and monitoring results; at least one voice processing engine under test; and a timing object (rather than, for example, an audio object used in TTS processing).
The test application allows the user to configure test parameters, such as selecting a particular voice processing engine to be tested and a maximum buffer size for the timing object. The test application then launches a voice processing engine (e.g., a TTS engine). The engine then processes input data, performs a conversion of the data to a different format, and outputs the converted data to the timing object. For example, if the voice processing engine is a TTS engine, the engine receives text data (e.g., the text string xe2x80x9ctesting, one, two, threexe2x80x9d), processes the text data into speech data, and outputs the speech data to the timing object.
The timing object simulates a real audio object and tracks an amount of data remaining in the timing object. Preferably, the timing object includes a counter to track the amount of data remaining in the timing object. Alternatively, the timing object may include a buffer to track the amount of data remaining in the timing object.
The timing object regularly receives data from a voice processing engine. The timing object also simulates a regular extraction of a specified amount of data from the timing object. The counter or the buffer of the timing object is used to track the amount of data remaining in the timing object as data is received from the voice processing engine and xe2x80x9cremovedxe2x80x9d at regular intervals by the timing object. If the timing object tries to remove more data than remains in the timing object, it reports an underflow, unless the voice processing engine has signaled that it has completed processing the data.
With this system, the test application can successively launch multiple voice processing engines until an underflow condition is detected, which determines the capacity of the data processing system to handle multiple voice processing engines.