This invention relates to automated testing of a Voice Response System (VRS), and more particularly to testing the correctness and speech quality of VRS prompts using a Perceptual Speech Distortion Metric (PSDM).
Automated Voice Response Systems include applications such as Auto-Attendants (AA), voice mail and voice-menus. A user navigates through a VRS menu by pressing keys on a standard touch-tone telephone. Pressing the keys generate Dual Tone Multiple Frequency (DTMF) signals. The VRS responds to the DTMF signals by generating speech signals, hereafter known as xe2x80x98prompts.
When a call is established with the VRS, the VRS plays out a particular speech file that invites the user to respond by pressing a telephone key (0-9,*,#). Depending on the key pressed, the VRS responds by playing out an appropriate prompt inviting a further user response. The process of prompt and user response is repeated until the user accesses the right service or is connected with the correct department, etc. VRS applications have state machines that define what prompt is played and the acceptable user response, i.e., the states that are reachable from the current state. A map of these states and the allowable transitions among the states is referred to as a state tree or state machine.
The VRS needs to be tested to determine whether particular keypresses are decoded correctly and whether the correct prompt or recorded voice is played back. There are two major components to testing VRSs. One testing component tests how well the VRS accepts DTMF tones conforming to certain time and frequency standards and rejects those DTMF tones that do not. A second component tests the logical integrity or consistency of the VRS state machine. Given a valid DTMF tone, this testing component verifies that the VRS state machine progresses correctly through the indicated or desired states.
One testing method is to manually walk through the VRS state tree using an operator""s hand and ear to manually identify any perceived logical errors in the system. This manual testing method does not scale well for monitoring the performance of the VRS under load conditions. It would be difficult and expensive for a few hundred people to repeatedly dial-up and listen to the same VRS at the same time.
An automated test method uses a speech recognition engine to verify proper VRS prompt responses. Repeated and possibly simultaneous calls are automatically made to the VRS under test. DTMF tones are automatically generated according to a script. Speech recognition technology is then used to identify the voice prompt as correct or incorrect by comparing the received speech with stored templates.
This automated test method is workable, but lacks robustness. For example, classification of speech is not 100% reliable even under perfect speech transmission conditions. Standard telephony-bandlimited channels present difficulties in accurately recognizing VRS voice prompts. Transmission problems, such as lost packets in a VoIP network and the use of low-bit-rate speech coders, reduce the ability to accurately recognize voice prompts. Speech recognition engines are also computationally intensive and require substantial time and effort for training. Because speech recognition engines are prohibitively time-consuming to develop, designers often are forced to license expensive third party software.
Outputs from speech recognition engines are essentially binary- correct or incorrect. However, when the VRS is under load due to high call volume, the prompts played out may be correct, but the output audio signal may be distorted. The level of distortion may be small enough so a listener can still understand the prompt. On the other hand, distortion may be so great that the listener cannot understand the voice prompt. Unfortunately, the prompts can only be classified by the speech recognition engine as xe2x80x98perfectly correctxe2x80x99 or xe2x80x98perfectly incorrectxe2x80x99.
Accordingly, a need remains for a simple low-cost system that more effectively tests Voice Response Systems.
The Voice Quality Test (VQT) platform uses a Perceptual Speech Distortion Metric (PSDM) such as, but not limited to, ITU standard P.861 (PSQM) to effectively test Voice Response Systems (VRS). The VQT platform automatically initiates an off-hook condition and dials a VRS phone number over a telephone line. The VRS at the dialed phone number answers the phone call and sends an initial voice prompt to the VQT platform. A signal generator on the VQT platform generates sequences of DTMF tones that progress through the state tree of the VRS according to a user test script. The VRS responds with voice prompts that are recorded by a signal recorder on the VQT platform.
A reference speech library in the VQT platform contains reference signals representing the correct voice prompts for each one of the states in the VRS. The PSDM generates a perceptual distortion value for each voice prompt received from the VRS by comparing the received voice prompt with the reference signals associated with the same VRS state. The perceptual distortion values are used to identify the received voice prompts as either correct or incorrect responses to the signal generator DTMF tones. The perceptual distortion values also have the advantage of quantifying different amounts of perceptual distortion in the voice prompts.
By using the perceptual sound quality matrix, the VQT platform can more accurately distinguish correct voice prompts from incorrect voice prompts. In addition, the VQT can identify correct voice prompts that, due to distortion, are either difficult to understand or completely unintelligible. This provides more detailed and accurate analysis of VRS systems using relatively simple testing equipment.
A further testing capability is realized because the invention offers the capability of recognizing whether the received voice prompt is correct or incorrect. The invention controls the VRS system under test by generating DTMF tones. A VRS system must classify incoming DTMF tones as valid or invalid based on the duration and frequency content of these tones. For example, a DTMF tone of only 20 milliseconds (ms) duration should not be accepted by the VRS, and should not result in a state change. The DTMF generator embodied in the invention offers control over tone timing (digit duration and inter-digit silence duration), and independent control over DTMF tone levels and frequencies. Through this function, the VRS system under test can be stimulated with tones that are either valid or invalid, and the corresponding acceptance or rejection of these tones by the VRS is monitored.
The foregoing and other objects, features and advantages of the invention will become more readily apparent from the following detailed description of a preferred embodiment of the invention which proceeds with reference to the accompanying drawings.