Packet video systems such as High Definition Television (“HDTV”) and Internet Protocol Television (“IPTV”) are becoming increasingly important today and are replacing older non-packet broadcast television and video streaming. Such packet video systems can experience transmission problems which lead to lost or delayed packets. This, in turn, can cause a degradation in the quality of service delivered to the end-viewer such as frozen or distorted images.
Providers of broadcast or streaming video commonly encrypt video streams to ensure that only authorized persons can view the video content. While this encryption is necessary to prevent the unauthorized dissemination of the provider's video content, encryption also precludes easy diagnosis of transmission problems within the packet network. This is because packet analyzing systems cannot analyze degradations within an encrypted video stream to determine what effect those degradations might have on the quality of service delivered to the end-viewer. Because not all packet losses within a video stream will have a human-perceptible impact on the quality of the video, it is necessary for a network analyzer to determine the type and content of packets that are lost. Thereafter, the analyzer can estimate the subjective effects of the lost packets on the viewer.
It is well known in the art that most packet video systems display a series of pictures (or “frames”), each frame representing a small change from the previous frame. Frames are usually updated 10-50 times per second. Digital video that is carried over packet networks is usually compressed using standard methods such as MPEG2, MPEG4, or H.264. These compression techniques produce three distinct types of frames, denoted as I-frames, P-frames, and B-frames, respectively. Each frame has a picture header that identifies the type of frame and contains other data related to the image size and encoding.
I-frames (“intra” frames) are intra-frame encoded frames that do not depend upon past or successive frames in the video stream to aid in the reconstruction of the video image at the video receiver. Rather, the I-frame itself contains all the information needed to reconstruct an entire visible picture at the video receiver. As such, the I-frame is the largest in size of the three types of frames, typically 2-5 times as large as a P-frame or B-frame. Because of its large size, an I-frame must often be broken up and sent in multiple packets over the packet network. An I-frame (along with a P-frame) is also known as a “reference frame” because it provides a point of reference from which later frames can be compared for reconstruction of a video image.
P-frames (“predicted” frames) are inter-frame encoded frames that are dependent upon prior frames in the video stream to reconstruct a video image at the video receiver. In essence, P-frames contain only the differences between the current image and the image contained in a prior reference frame. Therefore, P-frames are typically much smaller than I-frames, especially for video streams with relatively little motion or change of scenes. P-frames are also reference frames themselves, upon which successive P-frames or B-frames can rely for encoding purposes.
B-frames (“bi-directionally predicted” frames) are inter-frame encoded frames that depend both upon prior frames and upon successive frames in the video stream. B-frames are the smallest type of frame and cannot be used as reference frames.
A typical video stream is divided up into a series of frame units, each known as a “Group of Pictures” (“GoP”). Each GoP begins with an I-frame and is followed by a series of P and B-frames. The length of the GoP can be either fixed or variable. A typical GoP might last for 15 frames. In video sequences where there is little motion and few scene changes, the P and B-frames will tend to be small because little has changed in the image since the previous reference frame. However, in a video sequence with considerable motion or many scene changes, the P and B-frames will be considerably larger because they must contain more data to indicate the large amount of changes from the previous reference frame. Some video compression algorithms will even include an I-frame in the middle of a GoP when necessitated by a large amount of motion or a scene change in the video sequence. This allows successive P and B-frames to reference the recent I-frame and hence they can contain smaller amounts of data.
The size of the encoded frames can also depend upon the amount of detail in the video sequence to be encoded. Images in a video sequence with a high detail will produce encoded frames with more data than video sequences with a low detail.
As discussed previously, not all packet losses or packet delays will have a human-perceptible impact upon a video sequence. The loss of a single B-frame will have little impact, because no other frames are dependent upon that frame and hence the image will only be distorted for the fraction of a second corresponding to the single B-frame. The loss of a reference frame (an I-frame or P-frame), however, will affect any P-frame or B-frame that depends on the reference frame. A series of packet losses—especially those involving reference frames—will begin to cause human-perceptible degradations in the video image quality. Furthermore, losses of reference frames at the beginning of a scene change or during high-motion video sequences are more likely to cause human-perceptible distortions than losses of reference frames in relatively static video sequences. Conversely, losses of non-reference frames during scene changes or high-motion video sequences are less likely to produce noticeable distortions because the visual artifact is obscured by the rapid changes in the images presented to the viewer.
The GoP length and structure can be fixed or variable, depending upon the particular video stream. For streams with a fixed GoP structure, the I, P, and B frames occur at well-defined and fixed intervals within the video stream. In such a case, a network analyzer can easily determine whether a lost packet is part of an I, P, or B frame even if the video stream is encrypted. However, if the GoP structure within an encrypted video stream is variable or unknown to the network analyzer, then the network analyzer cannot readily determine the nature or content of a lost packet. Prior art systems have attempted to partially decrypt packets to determine their content. However, this will only work if the network analyzer has the encryption key.