This section is intended to introduce the reader to various aspects of art, which may be related to various aspects of the present invention that are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present invention. Accordingly, it should be understood that these statements are to be read in this light, and not as admissions of prior art.
Video delivery over IP network is unreliable. A requirement for VQM (video quality modeling) is to rate the quality degradation caused by IP transmission impairment (e.g. packet loss, delay, jitter), in addition to those caused by video compression. The artifacts are evaluated after applying error concealment (EC) at the decoder, since the result should relate to the video quality perceived by a viewer. The goal of EC is to estimate missing macroblocks (MBs) that arise from bit-erasure or packet loss, in order to provide a minimum degree of perceptual quality degradation. Thus, accurate prediction of the EC effectiveness is a fundamental part of VQM measuring transmission impairment.
EC methods are either spatial (i.e., bilinear interpolation, mainly for I-frames) or temporal (i.e., estimation of lost motion vectors (MV), mainly for B- and P-frames). VQM technologies can be categorized into packet-layer model, bitstream-level model, media level model and hybrid model. ITU-T SG12/Q14 considers a bitstream-level no-reference video quality (VQ) assessment model to predict the impact of observed coding and IP network impairments on quality in mobile streaming and IPTV applications. It predicts a Mean Opinion Score (MOS) using the bitstream information, and information contained in packet headers, prior knowledge about the media stream and buffering information from the client. H. Rui, C. Li, and S. Qiu in “Evaluation of packet loss impairment on streaming video”, J. Zhejiang Univ.-Sci. A, Vol. 7, pp. 131-136 (January 2006) propose a VQM model that uses strong spatial discontinuities as hints of packet loss, and is based on decoded pixel information. However, this information is not available on bitstream-level.
T. Yamada, Y. Miyamoto, and M. Serizawa in “No-reference video quality estimation based on error-concealment effectiveness”, Packet Video, 288-293, (2007) describe a no-reference hybrid VQM using both bitstream-level information and the decoded pixel information. It maps the number of MBs for which the error concealment is determined as ineffective to a MOS value.
A problem with the above methods is that they do not work well when a lost MB is not stationary, which happens quite often in realistic video sequences.
A. R. Reibman, V. A. Vaishampayan and Y. Sermadevi in “Quality monitoring of video over a packet network”, IEEE Transactions on Multimedia, 6(2), 327-334, (2004) use a no-reference bitstream-level VQM to estimate MSE (Mean Squared Error) of an EC video sequence in case of transmission impairment. Estimates of some statistical parameters from the received video bitstream on a macroblock basis (such as DC and AC components of DCT of I-frame MBs, the motion vectors of P- and B-MBs) are used. One problem of this model is that it uses MSE (mean squared error) as target visual quality metric, instead of the subjective MOS. It is well known that MSE is not a good metric for subjective video quality, especially for measuring quality degradation caused by transmission impairment.