VoIP Services
Voice over IP (“Voice over IP”—IP denoting the Internet Protocol) networks are packet-switched phone networks. In contrast to their circuit-switched predecessors (e.g. the PSTN) the control plane (signaling information—who calls whom) may use a different path through the network than the media plane (media—the call content). The media plane is sometimes also referred to as the user plane. VoIP services can be considered to consist of a signaling plane and a media plane. On the signaling plane various protocols describe the communication session (call) flow in terms of involved parties, intermediary VoIP entities (i.e. VoIP proxies, routers) and the characteristics of the VoIP service features. The media plane typically carries the media information (i.e. audio and/or video data) between the involved parties. Neither the media plane nor the signaling plane alone is sufficient to implement and provide a VoIP service. On the signaling plane protocols like SIP (see IETF RFC 3261, “SIP: Session Initiation Protocol”, available at http://www.ietf.org) or ITU-T recommendation H.323 (see H.323, “Packet-based multimedia communications systems”, Edition 7, 2009, available at http://www.itu.int) are commonly used, whereas protocols like RTP (Real-time Transport Protocol, see IETF RFC 3550, “RTP: A Transport Protocol for Real-Time Applications”, available at http://www.ietf.org), MSRP (see IETF RFC 4975, “The Message Session Relay Protocol (MSRP)”, available at http://www.ietf.org) or ITU recommendation T.38 (see T.38, “Procedures for real-time Group 3 facsimile communication over IP networks”, Edition 5 (2007) or Edition 6 (2010), available at http://www.itu.int) may be present on the media plane.
In contrast to the traditional PSTN (Public Switched Telephone Network) network both planes may be on different infrastructure using different protocols and even take different routes through a network.
Measuring Voice Quality
Today two main categories exist for measuring voice quality. The first method is called the subjective method, which involves real human test persons who express their opinion about their perceived voice quality. The average quality rating from all test persons is expressed as the Mean Opinion Score (MOS). The MOS score is expressed as an Absolute Category Rating (ACR) which defines a 5 point scale from 5 (excellent), 4 (good), 3 (fair), 2 (poor) to 1 (bad). An attempt for repeatable measurement results has been made, by defining the ITU-T P.800 (see http://www.itu.int/rec/T-REC-P.800-1996084) industry recommendation, which provides normative speech samples to be used for the subjective test method. The results of the subjective test method are further separated in listening and conversational quality. This is expressed by further specifying the type of the MOS score:                MOSLQS (Listening Quality—Subjective)        MOSCQS (Conversational Quality—Subjective)        
Since the subjective method does involve human beings, the method is not suited to be automated by test equipment.
The second method for measuring voice quality is called objective method. This method has been designed for automated voice measurement by test and monitoring equipment. The goal of this method is to provide reliable, objective and repeatable measurement results for a voice quality rating that is similar to the subjective method performed by real human beings. Similar to the subjective method, MOS scores for listening and conversational quality have been defined:                MOSLQO (Listening Quality—Objective)        MOSCQO (Conversational Quality—Objective)        
The voice quality may also be determined using methods, as for example known from the industry standards ITU-T G.107 E-Model and ITU-T P.564, discussed below. In case subjective voice quality is estimated, this is typically indicated in the index of the MOS value, where an “E” denotes the result to be estimated:                MOSLQE (Listening Quality—Estimated)        MOSCQE (Conversational Quality—Estimated)Intrusive and Non-Intrusive Monitoring of Voice Quality        
The objective MOS scores can be measured following two very different approaches. The first approach is an intrusive or active method, where the speech samples defined in ITU-T P.800 will be encoded by a VoIP sender, transferred over the packet based IP network and then decoded by the VoIP receiver. The MOS score is then calculated by comparing the known speech input signal from the VoIP sender with the received speech signal from the receiver. The method is called intrusive or active because the test signal is transferred in addition to eventually other VoIP traffic available on the network. Active VoIP monitoring can be used for VoIP readiness tests, prior deployment of a VoIP infrastructure because no other VoIP traffic is required, since the test equipment does generate the test data used for measurement itself. Active testing has been defined by the industry recommendation ITU-T P.862 PESQ (see http://www.itu.int/rec/T-REC-P.862-2001024) and ITU-T P.862.1 (see http://www.itu.int/rec/T-REC-P.862.1-200311-I). A benefit of this method is that all factors that can have an impact on VoIP quality are being considered, like the VoIP endpoint, codec, noise, delay, echo and the effects of the IP network. The drawback of active testing is that real voice testing of real calls performed by real users is not measured. Because of the transient nature of VoIP impairments in IP networks, it is quite possible that the results of active testing do not reflect the quality experienced by real users.
The second approach is the passive measurement method. With passive monitoring real VoIP calls are measured so that no artificial traffic needs to be generated. The industry standards ITU-T G.107 E-Model (see http://www.itu.int/rec/T-REC-G.107-201112-I) and ITU-T P.564 (see http://www.itu.int/rec/T-REC-P.564-200711-I) define recommendations for passive monitoring of VoIP traffic in IP networks.
FIG. 11 provides an overview on the different measurement concepts, and where they are being applied. Passive monitoring is measuring real VoIP calls without using a reference speech signal. This also means that deployment of passive monitoring solutions is often easier, since only one location has to be visited. Since the speech payload of live calls is unknown, only those statistics/metrics can be considered that are independent from the speech payload. Mainly these metrics are loss, jitter and delay.
A single-server non-intrusive, passive monitoring system that is capable of implementing the ITU-T G.107 standard, considers effects visible to the monitoring system via deep packet inspection and packet flow analysis.
A minimal non-intrusive, passive monitoring system is made from a monitoring probe and a test access port (TAP) to connect to the network to be tested, and optionally a post-processing platform to visualize the measurement results of the monitoring probe. A TAP is a passive network device, which can mirror network traffic without interference of the original network traffic, by creation of a copy of every IP packet. It provides a copy of every packet send or received on the network, by separating the full-duplex network link, into two half-duplex network links, which are then connected to a specialized packet capture cards (network interface card—NIC) installed in the probe. These specialized packet capture card are capable of receiving and processing packets on the physical interface and to provide them to the application layer, nearly without requiring CPU processing time and operation system functionality.
FIG. 12 shows an exemplary monitoring system of a passive non-intrusive monitoring solution deployed in a VoIP network. FIG. 12 indicates possible mid-point monitoring locations (TAP positions) within a carrier network. Optionally multiple monitoring probes can be deployed in the network so that RTP streams can be evaluated end-to-end. Furthermore the impact of installed network hardware like an SBC or media-gateway on the RTP stream quality can be analyzed.
As mentioned above, passive non-intrusive monitoring solutions for VoIP traffic are based on packet flow analysis of RTP streams, which are used to transfer speech over IP networks. This analysis can be performed either as an integral part of a VoIP device like an IP-phone, media-gateway, or in mid-point somewhere in the network between the calling parties. Both approaches have advantages and disadvantages. If the analysis is integrated into a VoIP device, additional important information becomes available to the packet flow analyzer like the size of the de-jitter buffer, and if received packets are considered for further processing or discarded due to late arrival (large jitter). The availability of this information can be a major advantage in accurately estimating the VoIP quality for the end user of the device.
Disadvantages are that devices may only have a limited view on the full VoIP service (e.g. an IP-phone) because only the incoming or outgoing calls will be subject to monitoring. All other VoIP traffic directed to other end points would be unavailable, unless the flow analysis is integrated into every IP-phone used, which is practically hard to achieve. Another disadvantage is that VoIP devices are service specific hardware with limited performance and resources available for additional data processing for which they have not been designed. Packet flow analysis can be a very CPU intensive task and the results have to be stored somewhere. CPU resources and disk space is something that is not sufficiently available on IP-phones or media-gateways.
Because of these limitations, a monitoring solution based on passive mid-point monitoring as shown in FIG. 11 and FIG. 12 may be advantageous as monitoring is performed on copies of the network traffic, which is produced by a network TAP to which the monitoring probes are connected as exemplified in FIG. 1. This way the quality measurement doesn't have any impact on the real network traffic, is hardware and manufacturer independent, while being able to produce a full view of all live calls being transferred at the network location under test.
There are however some cases where passive monitoring does not allow detecting every problem. In those cases a full packet capture is often the only reasonable approach to solve the issue. The amount of data that would need to be processed when full packet capture to disk is used is enormous even on a 1 GB/s Ethernet connection and it is not possible from an economic point of view to process this amount of data in real-time, so that results could be used to dynamically adapt to impairments. Practical experience has also shown that only a few VoIP calls are experiencing quality degradations at the same time, so that it would be a huge waste of resources to perform analysis on a full packet capture, just to find out that the majority of the VoIP sessions have no quality issues at all.
For example, consider a 1 GBit/s Ethernet full-duplex link fully utilized by VoIP sessions using a G.711 codec for speech encoding with a packet interval of 20 ms (160 Bytes of payload in every packet). In this case approximately 1.000.000 RTP packets per second generated from 10.500 concurrent VoIP sessions have to be dealt with. A VoIP packet received at the network interface by the monitoring probe will typically consist of the MAC header (Ethernet header), the IP header, the UDP header and a RTP header followed by the actual payload, the speech data. Overall the size of a single RTP packet on the data link layer using a G.711 codec and 20 ms packet interval is 214 Bytes, which means that at least 214 MBytes of data have to be stored per second to capture all packets on a 1 GBit/s Ethernet link fully utilized by VoIP sessions. In addition to the packet data itself, a packet capture header may need to be added depending on the capture format being used, which would be further increasing the amount data to capture per second. The amount of hardware and processing power required to process this number of bytes is just too high to make economic sense for a larger deployment of passive, mid-point monitoring solutions in VoIP networks.
FIGS. 11, 12 and 1 discussed above show the basic setup of a passive monitoring system. Typically, passive monitoring systems determine (transport-related) metrics of the monitored media streams, such as interarrival time of the packets, packet jitter, packet loss, packet delay, etc. A more advanced passive monitoring system, which allows for the determination and further analysis of metrics on a time-interval basis to identify many different types of impairments and problems, is for example known from the applicant's application PCT/EP2012/000042.