Block-based coding is the dominant video encoding technology with codec standards such as H.263, MPEG-4 Visual, MPEG-4 AVC (a.k.a. H.264) and the emerging HEVC/H.265 standard being developed in JCT-VC. These codecs use different types of pictures (which employ different types of prediction) to compress the video as efficiently as possible. An intra-coded picture (I-picture) may only be predicted spatially from areas within the picture itself. A predictive picture (P-picture) is predicted from previously coded I or P pictures. A bidirectional predictive pictures (B-picture) is predicted from both previous and/or subsequent pictures.
The result of predicting a particular picture from other pictures is likely to produce a result which is different to the original version of the particular picture. The differences between the prediction of a picture and the original picture are called the picture's residual data. The residual data is transmitted or stored together with the prediction instructions in order to allow the original picture to be accurately reconstructed. In order to make the residual data more compact the data is transformed into the frequency domain. In H.264/AVC the Hadamard transform and a transform similar to a Discrete Cosine Transform are used on 4×4 or 2×2 (chroma) blocks for this purpose. Once the residuals have been transformed a quantization is performed on the transformed data to limit the data that needs to be transmitted or stored. It is in this step that the actual lossy compression takes place. The level of quantization is determined by the quantization parameter (QP) and a corresponding look-up-table. The QP is set at picture level but can also be altered at macro-block (MB) level.
A higher QP means that the residual data are more coarsely quantized, so less detail of the residual data is captured and so the reconstructed picture will match the original picture less well. A lower QP means that more detail of the residual data is captured and so the reconstructed picture will match the original picture well.
It is fairly expensive in terms of bits for an encoder to alter the QP at MB level and so the majority of changes in QP occur between pictures. When the quantization has been performed the quantized transformed residuals are encoded using entropy coding. In H.264/AVC one of the entropy coding algorithms, namely CAVLC (Context Adaptive Variable Length Coding) or CABAC (Context Adaptive Binary Arithmetic Coding), is used for this.
To increase error resilience in error prone networks I-pictures are usually inserted periodically to refresh the video. I-pictures are also inserted periodically to allow for random access and channel switching. Where the forced intra pictures should be inserted in time is not defined by the video coding standard, but is up to the encoder to decide. Typically, video coding standards define the video bitstream syntax and the decoding process, but do not define the encoding process. In other words, the method by which the video sequence is encoded is not standardized, whereas the output of the encoding process is standardized.
To ensure the end-to-end quality of video over fixed and mobile networks network operators and broadcast vendors can utilize objective video quality models. Objective video quality models are mathematical models that approximate results of subjective quality assessment, but are based on criteria and metrics that can be measured objectively and automatically evaluated by a computer program.
The performance of an objective video quality model is evaluated by computing a metric between the objective score generated by the objective video quality model and subjective test results. This metric can be, for example, the correlation between subjective and objective data or the mean squared error. A subjective test result may comprise a mean opinion score (MOS) obtained from the opinions of a plurality of human test subjects.
Perceptual models may be considered to be a subset of objective video quality models. Whereas an objective video quality model can refer to any automated quality assessment method, a perceptual model attempts to determine to what extent any quality defects would be perceived by a viewer. Perceptual models can utilize the pixel information in the decoded video sequence, and in the case of full-reference models the reference signal may also be used to predict the degradation of the processed video. A big disadvantage of perceptual models is that they are usually computational demanding and not very suitable for deployment on a large scale for network monitoring purposes.
An alternative approach to perceptual quality models that is more light-weight than a full-reference model is to use network layer protocol headers as input for quality estimation of a transmitted video. This approach makes the model very efficient to implement and use, but the quality estimation of the transmitted video will be rather coarse. Therefore a video bitstream quality model may also be implemented. This model takes the encoded elementary stream as input in addition to network protocol headers and has the advantage that it will be fairly light-weight and yet has the potential of getting a better estimate of the quality of the video than one just using network layer protocol headers. Such a video bitstream quality model may operate in two modes, one mode where full decoding of the bitstream is allowed and another, lower complexity, mode where full decoding is not allowed (such that pixel information cannot be used).
“Modeling the impact of frame rate and quantization stepsizes and their temporal variations on perceptual video quality: a review of recent works” by Y-F. Qu, Z. Ma, and Y. Wang, and published in Conference on Information Sciences and Systems (CISS), 2010 44th Annual Conference, 17-19 Mar. 2010, describes the effect of frame rate and quantization stepsize as well as the temporal variation of the frame rate on the perceptual quality.
“Evaluation of temporal variation of video quality in packet loss networks” by C. Yim and A. Bovik, and published in Signal Processing: Image Communication 26 (2011) describes the effect that variations in the temporal quality of videos have on global video quality.
There is thus a need for quality defect detection that does not require decoding of the encoded video bitstream. Such a quality defect detection may be suitable for implementation with a quality model.