Among the numerous video distribution services, IP-based video as it is used, for example, for IPTV (Internet protocol TV) is becoming increasingly important, and is more and more replacing analogue or non packet-based transmission methods. It is a major responsibility of the broadcast provider towards both content provider and customer to maintain a high level of quality of its service. In large-scale video distribution networks as they are involved, for example, in IPTV services only fully automated quality monitoring probes can fulfil this requirement.
In order to achieve a high degree of satisfaction of the user of video services such as non-interactive streaming video (IPTV, VoD) or static video (DVD), besides technical performance indicators such monitoring probes need to also provide estimates of video quality that the users of the services perceive during a given monitoring interval.
To this aim, technical video quality models are developed which provide instrumental estimates of the video quality as perceived by the user, and thus are technical models of the user. For instance, those models can output the degree of similarity between the video received at the user side and the original, non-degraded video. In addition, representing a more sophisticated solution, the Human Visual System (HVS) can be modelled using a technical system. At last, such a model shall provide quality estimates that correspond to ratings given by users, which is achieved by training the technical models on the results of extensive subjective quality tests.
Video quality models and thus measurement systems are generally classified as follow:
Quality Model Types
                Full Reference (FR): a reference signal is required.        Reduced-Reference (RR): partial information extracted from the source signal is required.        No-Reference (NR): no reference signal is required.Input Parameter Types        signal/media-based: the decoded image (pixel-information) is required.        parameter-based: bitstream-level information is required. Information can range from packet-header information, requiring parsing of the packet-headers only, over partial to full decoding of the bitstream.Type of Application        Network Planning: the model or measurement system is used before the implementation of the service in the planning phase, in order to identify the best possible implementation.        Service Monitoring: the model is used during service operation.Related information of the types of video quality models can be found in references [1], [2], or [3].        
In the context of IPTV, main distortions are caused by video compression and video packet loss. Elements influencing the perceived video quality in case of video packet loss are:    a) The amount of lost packets    b) The packet loss distribution, which can, for example, be described in terms of the average number of lost packets in a given loss burst, and the distribution of such bursts.    c) The GOP-structure, including
i) The GOP-length, i.e., the distance between frames which do not require previous or further frames to be decoded, the so-called ‘key-frames’ or “I-frames”. One Group of Picture covers one I-frame and all frames till the next I-frame of the video sequence.
ii) The number and repartition of B- and P-frames in each GOP, that is predicted (P-) and bidirectional (B-) frames.
iii) The GOP “type”: open-GOP or closed-GOP; when the GOP is open, frames belonging to one GOP may be encoded using reference frames from the following or previous GOP; when the GOP is closed, only reference frames from the current GOP can be used as reference for encoding frames of the current GOP.    d) The frame type of the frame impaired by packet loss. If the loss occurs in an I-frame or a P-frame, the loss is propagated to all frames referencing the impaired frame, typically till the next (reference) I-frame, while if the loss occurs in a B-frame, the loss is not propagated, except in the case of hierarchical B-frame coding. In case of hierarchical coding, some of the B-frames are also used as reference frames for other B-frames. Loss in reference B-frames are thus propagated to the dependent B-frames.    e) The number of packets per frame. This number depends on the bitrate and on the spatio-temporal complexity of the video. The higher the bitrate, the more packets are required to transmit the frame. The spatio-temporal complexity of the video influences the distribution of packets among frames: basically, the higher the spatial complexity of the video is, the more packets are required for I-frames or P- and B-frames (if spatially/intra predicted macroblocks are required to capture the information), and the higher the temporal complexity of the video is, the more packets are required to transmit P- and B-frames. In turn, the higher the number of packets per frame, the lower the corresponding amount of pixels contained in the packet. Considering a certain loss probability, the more packets a frame contains, the higher will be the probability of having packet loss in this frame, and the higher will be the probability that the loss propagates if this frame is a reference frame.    f) The packet-loss-concealment, i.e. the strategy implemented in the decoder for concealing the loss. Packet-loss-concealment can coarsely be categorized in terms of slicing or freezing. A slice is defined as an area of the video frame which can decoded independently. Thus, if it is affected by a packet loss—the decoder fills this area with data from (spatially or temporally) neighbouring correctly received areas. Slicing needs to be implemented by the encoder which introduces the slice-headers the decoder will use as synchronization-points. In case of packet loss and freezing-type loss concealment, the last correctly received video frame is repeated typically until the next intact I-frame arrives, or another intact reference frame the affected frame is predicted from In broadcast-services, freezing includes skipping the erroneous frames. In non-broadcast services, lost packets may be resent and played out even after a delayed reception. This can be considered as a re-buffering, and the missing information is not skipped. Note that the latter case is not considered by this invention.    g) If slicing is used as packet-loss-concealment, the number of slices per frame (see FIG. 2). The number of slices per frame is selected at the encoder stage. In case of packet loss and if slicing is used as packet-loss-concealment, this number influences the spatial extent of the loss. Indeed, if a packet loss occurs in a slice, the loss is propagated till the next slice, i.e. till the decoder can resynchronize based on the next available slice header. As a consequence, increasing the number of slices per frame reduces the spatial extent of the loss. However, this also increases the number of slice headers, and thus decreases the encoding efficiency at a given overall bitrate. This reflects that a trade-off exists between coding efficiency and robustness to packet-loss.    h) The rate control type employed by the encoder, that is, constant versus variable bitrate coding. Specifically, the type of rate control (constant or variable bitrate coding) employed by the encoder together with the spatio-temporal complexity of the content strongly affects the mapping of the spatio-temporal information into bytes or, in other terms, the number of packets required for a given spatio-temporal area. Note that the present invention targets both the case of constant and variable bitrate coding, but due to the reduced validity of estimating the spatio-temporal extend of loss events based on header information in strongly variable bitrate coding cases, the quality predictions provided by the technical user model described in this invention will be less close to the actual perception.
Quality estimation methods commonly support a separate estimation of the quality related with the coding (compression, Qcod) of the video signal, and the quality due to packet loss during transmission (Qtrans). Quality estimation methods commonly use one of two approaches to combine an estimation concerning the quality of the compression and the transmission quality. Equation (1) and (2) illustrate the two different approaches, where the respective value-ranges represent exemplary implementations:Q=Q0−Qcod−Qtrans, Q0, Qx ε [0 . . . 100]  (1)Q=Q0*Qcod*Qtrans, Q0, Qx ε [0 . . . 1]  (2),Here, Q0 represents the base quality, or a function of the base quality. Base quality here refers to the perceived quality of the video before encoding, transmission and decoding.
Quality due to packet loss (Qtrans) is commonly estimated from the bit-rate and packet-loss-rate, as in [4]. For taking into account the packet-loss-distribution, parameters describing the repartition of loss within the video sequence, such as the burst density and burst duration as in [5] or the number of packets lost in a row as in [6] are also considered. Alternatively, parameters describing the packet loss frequency (i.e. number of packet-loss events within a given time period) as in [7] have been proposed. Those parameters are helpful in case of network planning but may be insufficient in case of service monitoring. For example, they do not capture which proportion of the hit frame is impaired, since they do not consider the total number of packets and the number of loss packets in the frame hit by loss. They are thus blind to the actual spatial extent of the loss. In addition, they do not consider the frame type of the frame hit by loss, and are thus blind to the temporal propagation and thus duration of the loss.
A parameter describing the temporal duration of the loss has been proposed in [8], but this parameter covers only freezing as packet-loss-concealment. An interesting proposal has been made in [10 ] for estimating the area in a frame that is affected by a loss in the case that slicing is applied. Here, the proposed approach is not applied to quality prediction as it is suggested in the present invention, and only covers one of the several sub-cases the present invention handles. In [9], a method is proposed using an approach similar to [10] but in a somewhat different context. It uses parameters that describe the spatial extent of the loss per frame and frame type and computes the quality of the frame based on those parameters. The frame timing and loss propagation are however not explicitly considered in terms of one single parameter for describing the loss-induced distortion.
Nevertheless, a perceptually adequate user model needs to use a quantitative mapping between the description of the loss and the quality impact in terms of the amount of perceptual degradation. In the model of the invention, it is assumed that the combination of the spatial extent and of the duration of the loss impacts the perceived quality. As a consequence, the invention defines a parameter that accurately and explicitly describes the spatial extent and the duration of the loss and a model that maps this parameter to the quality of the video sequence accounting for the given measurement window.