Network Planning
A proposal for estimating the perceived audio, video and audio-visual quality during network planning is part of the framework of International Telecommunication Union Study Group 12 (ITU SG12 planned recommendations G.OMVAS “Opinion Model for Video and Audio Streaming applications”). Video, audio and audio-visual qualities are predicted based on network assumptions like the chosen codecs, bit-rates and expected packet loss rates.
For speech services, the E-Model (ITU-T Rec. G.107, 1995-2005) can be used for estimating the perceived speech quality during network planning. It predicts the perceived speech quality based on the combination of impairments which are the transformation of the technical characteristics of the planned service onto a perceptual scale.
Service Monitoring
Audio-visual quality is commonly computed from the measured audio quality, the measured video quality and their interaction, as described in J. G. Beerends and F. E. Caluwe., “Relations between audio, video and audio-visual quality”, 1997, N. Chateau, “Relations between audio, video and audio-visual quality,” 1998 or in U.S. Pat. No. 7,197,452. This latter US patent follows a signal-based approach by measuring the audio and video qualities on the audio and video signals themselves. The combination of the video and audio qualities depends on the degree of motion of the video signal. Degradations introduced by the network, like packet losses, and audio-visual synchronization are not considered. U.S. Pat. No. 5,596,364 also provides a method for estimating the perceived audio-visual quality which takes into account the spatial and temporal activities of the video signal, i.e. the amount of details and motion complexity, but requires the transmission of features extracted from the signal before transmission as well as access to the destination signal, i.e. the receiving side, which involves decoding and reconstruction of the signal, which requires high computation power.
Measurement of Video Quality and Audio Quality
Quality measurement systems can be classified as follows:                Full-Reference (FR): the measurement system requires access to a reference signal (source signal, assumed to have perfect quality).        Reduced-Reference (RR): the system has access to partial information extracted from the source signal.        Non-Reference (NR): the reference signal is not available.A lot of FR and RR systems already exist, e.g., ITU J.144 for video, ITU-T Rec. P.862 “PESQ” for speech. However, for passive service monitoring, NR systems are the only practical choice, since they do not require any reference signal. NR systems can be applied at different points in the network including the client, i.e., at receiving side. For network planning, NR systems are used, since no signals or bit-stream information are available during planning. Especially in the context of data-prone or live-services such as standard definition and high definition television, the additional transmission of the reference signal in real-time is not feasible. Hence, for these services passive monitoring systems including NR quality models may be used.        
Most NR systems are signal-based and provide an estimation of the quality as perceived by a human user by analysing the signal itself at the receiving side. Those systems require high computation power, at least for the video signal, since they have to decode and reconstruct the signal. Moreover, they do not take advantage of the analysis of the bit-stream already done by the decoder. These drawbacks can be circumvented with video quality measurement systems estimating the video quality based on a bit-stream analysis, as described in WO-A-2004/054274, which uses information at the video macro-block level.