Companies are often dependent on mission-critical network applications to stay productive and competitive. To achieve this, information technology (IT) organizations preferably provide reliable application performance on a 24-hour, 7-day-a-week basis. One known approach to network performance testing to aid in this task is described in U.S. Pat. No. 5,881,237 (“the 237 patent”) entitled Methods, Systems and Computer Program Products for Test Scenario Based Communications Network Performance Testing, which is incorporated herein by reference as if set forth in its entirety. As described in the '237 patent, a test scenario simulating actual applications communication traffic on the network is defined. The test scenario may specify a plurality of endpoint node pairs on the network that are to execute respective test scripts to generate active traffic on the network. Various performance characteristics are measured while the test is executing. The resultant data may be provided to a console node, coupled to the network, which may also initiate execution of the test scenario by the various endpoint nodes. The endpoint nodes may execute the tests as application level programs on existing endpoint nodes of a network to be tested, thereby using the actual protocol stacks of such devices without reliance on the application programs available on the endpoints.
One application area of particular interest currently is in the use of a computer network to support voice communications. More particularly, packetized voice communications are now available using data communication networks, such as the Internet and intranets, to support voice communications typically handled in the past over a conventional switched telecommunications network (such as the public switched telephone network (PSTN)). Calls over a data network typically rely on codec hardware and/or software for voice digitization so as to provide the packetized voice communications. However, unlike conventional data communications, user perception of call quality for voice communications is typically based on their experience with the PSTN, not with their previous computer type application experiences. As a result, the types of network evaluation supported by the various approaches to network testing described above are limited in their ability to model user satisfaction for this unique application.
A variety of different approaches have been used in the past to provide a voice quality score for voice communications. The conventional measure from the analog telephone experience is the Mean Opinion Score (MOS) described in ITU-T recommendation P.800 available from the International Telecommunications Union. In general, the MOS is derived from the results of humans listening and grading what they hear from the perspective of listening quality and listening effort. A Mean Opinion Score ranges from a low of 1.0 to a high of 5.0.
The MOS approach is beneficial in that it characterizes what humans think at a given time based on a received voice signal. However, human MOS data may be expensive and time consuming to gather and, given its subjective nature, may not be easily repeatable. The need for humans to participate as evaluators in a test every time updated information is desired along with the need for a voice over IP (VoIP) equipment setup for each such test contribute to these limitations of the conventional human MOS approach. Such advance arrangements for measurements may limit when and where the measurements can be obtained. Human MOS is also generally not well suited to tuning type operations that may benefit from simple, frequent measurements. Human MOS may also be insensitive to small changes in performance, such as those used for tuning network performance by determining whether or not an incremental performance change following a network change was an improvement.
Objective approaches include the perceptual speech quality measure (PSQM) described in ITU-T recommendation P.861, the perceptual analysis measurement system (PAMS) described by British Telecom, the measuring normalized blocks (MNB) measure described in ITU-T P.861 and the perceptual evaluation of speech quality (PESQ) described in ITU-T recommendation P.862. Finally, the E-model, which describes an “R-value” measure, is described in ITU-T recommendation G.107. The PSQM, PAMS and PESQ approaches typically compare analog input signals to output signals that may require specialized hardware and real analog signal measurements.
From a network perspective, evaluation for voice communications may differ from conventional data standards, particularly as throughput and/or response time may not be the critical measures. A VoIP phone call generally consists of two flows, one in each direction. Such a call typically does not need much bandwidth. However, the quality of a call, how it sounds, generally depends on three things: the one-way delay from end to end, how many packets are lost and whether that loss is in bursts, and the variation in arrival times, herein referred to as jitter.
In light of these differences, it may be desirable to determine if a network is even capable of supporting VoIP before deployment of such a capability. If the initial evaluation indicates that performance will be unsatisfactory or that existing traffic will be disrupted, it would be helpful to determine what to change in the network architecture to provide an improvement in performance for both VoIP and the existing communications traffic. As the impact of changes to various network components may not be predictable, thus requiring empirical test results, it would also be desirable to provide a repeatable means for iteratively testing a network to isolate the impact of individual changes to the network configuration.
However, the various voice evaluation approaches discussed above do not generally factor in human perception, acoustics or the environment effectively in a manner corresponding to human perception of voice quality. Such approaches also typically do not measure in two directions at the same time, thus, they may not properly characterize the two flows of a VoIP call, one in each direction. These approaches also do not typically scale to multiple simultaneous calls or evaluate changes during a call, as compared with a single result characterizing the entire call. Of these models, only the E-model is generally network based in that it may take into account network attributes, such as codec, jitter buffer, delay and packet loss and model how these affect call quality scores.
An approach for testing network performance is discussed in commonly assigned U.S. patent application Ser. No. 09/951,050, filed on Sep. 11, 2001, the disclosure of which is hereby incorporated herein by reference as if set forth herein in its entirety. This patent application, entitled Methods, Systems and Computer Program Products for Packetized Voice Network Evaluation, addresses many of the shortcomings of the existing approaches discussed above. However, improved methods of assessing network performance as well as methods of presenting performance results to a customer or user may be desirable.