Among the numerous TV distribution services, IPTV (Internet protocol TV) is becoming increasingly important and is more and more replacing analogue or non packet based transmission methods. It is a major responsibility of the broadcast provider towards both content provider and customer to maintain the quality of its service. In large IPTV networks only fully automated quality monitoring probes, that raise alarm in the case of the occurrence of degraded audio and/or video quality, can fulfill this requirement. These monitor probes should be able to estimate the subjective quality that an end user would perceive. Several models exist or are in development, that translate objective measurement results from a video bit stream into so called “mean opinion score” (MOS) values. The objective measurement categories that can be used to produce this MOS value will for instance include the bit rate and the frame rate of the monitored video.
It is an inherent property of IP-networks that (mostly due to temporal overload at some point in the network) IP packets may get lost. Some of these losses may be almost invisible to the customer while others may cause severe degradation of the video quality. Even if means against these losses are part of an IPTV distribution system, these means can never guarantee an effectiveness of 100%. For instance, a retransmission request may take too long, or the retransmitted packet itself might get lost.
Therefore there always is a non-zero probability, that fragmentary bit streams are transmitted to the end user device. These in turn can cause visible or audible degradations in the reconstructed video or audio. Measurement categories may therefore also include values to express the probability for losses. Such values may include the expression of a “packet loss rate” and the “burstiness of loss events”.
In order to be able to estimate whether a packet loss will be invisible or, in the contrary be strongly visible even for long periods of time, it will be necessary to capture more properties of the monitored bit stream. The most important of these additional properties is the “frame type” of all frames and in particular the frame affected by losses. The possible values for the “frame type” property include “Intra-Frame” or “Key-Frame” (below called I-frame), “Predicted-Frame” (below called P-frame) and “Bidirectional-Frame” (below called B-frame). It is well known that only I-frames can be decoded without the knowledge of any prior frames. In the contrary, P-frames always depend on one or more predecessors called “reference frames”, because the information transmitted for a P-frame mainly consists of the difference between the video-frame it describes and its references. Therefore, packet losses within an I-frame or its consecutive P-frames are carried into every subsequent frame, because the loss-effected I- and P-frames serve in general as references for subsequent frames. These frames become therefore degraded even if they do not contain any losses themselves.
Due to this mechanism, a single packet loss error may linger through long parts of a video sequence, until the next error free I-frame occurs. Errors in P-frames and particularly in I-frames may therefore have a very high visibility.
The same reference frame mechanism is true for B-frames, but, since B-frames in general do not serve as references themselves, an error in a B-frame will only be visible in this single frame and hence be much less visible compared to errors due to losses in I or P-frames.
Since I-frames do not depend on any prior references, they represent the only points in a bit stream, were a video player or settop box can sync up with the video. Also, (loss free) I-frames are the only points in time to wipe out any degradation due to packet losses. The sequence of video frames between two I-frames is called “Group of Pictures” (GoP). In most of the cases P and B-frames in a GoP follow a more or less strict pattern like the typical GoP-pattern known from MPEG2: “I, B, B, P, B, B, P . . . ”. If this pattern is known, a reliable a priori estimation of the frame type of any picture in the bit stream is possible, even if the frame type itself can not be read from the bit stream due to packet loss or encryption.
Often times it is quite demanding to get good estimations for the above mentioned and other measurement values. This is mainly due to two independent reasons:
1. To prevent unauthorized access, the bit stream might be encrypted and important bit stream properties might not be readable at the measurement location.
2. Due to packet loss as mentioned above, important pieces of information might have been removed from the bit stream.
In WO 2009/02297 and WO 2009/012302 the “pattern” of the GoP is solely determined by independently estimating the frame-type of every individual video-frame with the aid of adaptive threshold values that discriminate between video-frames with very large size (I-Frames), medium size (P-frames) and small size (B-frames). Since I-frames contain on average twice to 5 times as many bits as P-frames or B-frames, it is easy to distinguish I-frames from P- and B-frames. It is rather unreliable though, to distinguish P- frames from B-frames. Although B-frames are on the average smaller than P-frames, the difference in size is not large, instead, the size variance of P- and B-frame sizes is. In general, average differences in size also depend a lot on the specific encoder, used to compress the examined video sequences and on the specific properties of this sequence. This is even more the case for a new encoding strategy of H.264 encoders called “hierarchical coding”, were some of the B-frames also serve as reference for other B-frames.
EP-A-2 077 672 relates to analyzing the transport stream such as to estimate the frame-types of an encoded video signal. In a first embodiment the “pattern” of the GoP is determined by determining the local size-maximum of a small number of consecutive video-frames, where the video-frame with the maximum size is considered P-frame, if the so calculated small/large relationship matches predefined “determination frame patterns”. All other frames are considered B-frames.
In a second embodiment frames are estimated as P-frames, if they exceed a threshold calculated as the average of a number of preceding frames multiplied by a factor larger one (e.g. 1.2). If this first calculation fails, which is supposed to detect the open-GoP B,B,P pattern, similar threshold based tests are done that depict other GoP-patterns. Since these tests are performed sequentially, and the first success is taken as the final result, a mismatch in the beginning of the tests chain can not be corrected by following tests.
All frame-type estimations of former art rely on the assumption that P-frames always have a sufficiently larger size than the temporally surrounding B-frames of the sequence. In reality this is not always the case. Only the average values of frame-type sizes reliably match this assumption.
Therefore it is favorable to detect the general GoP-structure by statistical means and apply this knowledge to individual frames, if the discrimination of frame-types by their size is unreliable or ambiguous as done in the present invention.