Digital signal compression is widely used in many multimedia applications and devices. Digital signal compression using a coder/decoder (codec) allows streaming media, such as audio or video signals to be transmitted over the Internet or stored on compact discs. A number of different standards of digital video compression have emerged, including H.261, H.263; DV; MPEG-1, MPEG-2, MPEG-4, VC1; and AVC (H.264). These standards, as well as other video compression technologies, seek to efficiently represent a video frame picture by eliminating the spatial and temporal redundancies in the picture and among successive pictures. Through the use of such compression standards, video contents can be carried in highly compressed video bit streams, and thus efficiently stored in disks or transmitted over networks.
MPEG-4 AVC (Advanced Video Coding), also known as H.264, is a video compression standard that offers significantly greater compression than its predecessors. The H.264 standard is expected to offer up to twice the compression of the earlier MPEG-2 standard. The H.264 standard is also expected to offer improvements in perceptual quality. As a result, more and more video content is being delivered in the form of AVC(H.264)-coded streams. Two rival DVD formats, the HD-DVD format and the Blu-Ray Disc format support H.264/AVC High Profile decoding as a mandatory player feature. AVC(H.264) coding is described in detail in “Draft of Version 4 of H.264/AVC (ITU-T Recommendation H.264 and ISO/IEC 14496-10 (MPEG-4 part 10) Advanced Video Coding)” by Gary Sullivan, Thomas Wiegand and Ajay Luthra, Joint Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG (ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6), 14th Meeting: Hong Kong, CH 18-21 January, 2005, the entire contents of which are incorporated herein by reference for all purposes.
Modern video coder/decoders (codecs), such as MPEG2, MPEG4 and H.264 generally divide video frames into three basic types known as Intra-Frames, Predictive Frames and Bipredicitve Frames, which are typically referred to as I-frames, P-frames and B-frames respectively.
An I-frame is a picture coded without reference to any picture except itself. I-frames are used for random access and are used as references for the decoding of other P-frames or B-frames. I-frames may be generated by an encoder to create random access points (to allow a decoder to start decoding properly from scratch at a given picture location). I-frames may be generated when differentiating image details prohibit generation of effective P or B frames. Because an I-frame contains a complete picture, I-frames typically require more bits to encode than P-frames or B-frames.
P-frames require the prior decoding of some other picture(s) in order to be decoded. P-frames typically require fewer bits for encoding than I-frames. A P-frame contains encoded information regarding differences relative to a previous I-frame in decoding order. A P-frame typically references the preceding I-frame in a Group of Pictures (GoP). P-frames may contain both image data and motion vector displacements and combinations of the two. In some standard codecs (such as MPEG-2), P-frames use only one previously-decoded picture as a reference during decoding, and require that picture to also precede the P-frame in display order. In H.264, P-frames can use multiple previously-decoded pictures as references during decoding, and can have any arbitrary display-order relationship relative to the picture(s) used for its prediction.
B-frames require the prior decoding of either an I-frame or a P-frame in order to be decoded. Like P-frames, B-frames may contain both image data and motion vector displacements and/or combinations of the two. B-frames may include some prediction modes that form a prediction of a motion region (e.g., a segment of a frame such as a macroblock or a smaller area) by averaging the predictions obtained using two different previously-decoded reference regions. In some codecs (such as MPEG-2), B-frames are never used as references for the prediction of other pictures. As a result, a lower quality encoding (resulting in the use of fewer bits than would otherwise be used) can be used for such B pictures because the loss of detail will not harm the prediction quality for subsequent pictures. In other codecs, such as H.264, B-frames may or may not be used as references for the decoding of other pictures (at the discretion of the encoder). Some codecs (such as MPEG-2), use exactly two previously-decoded pictures as references during decoding, and require one of those pictures to precede the B-frame picture in display order and the other one to follow it. In other codecs, such as H.264, a B-frame can use one, two, or more than two previously-decoded pictures as references during decoding, and can have any arbitrary display-order relationship relative to the picture(s) used for its prediction. B-frames typically require fewer bits for encoding than either I-frames or P-frames.
As used herein, the terms I-frame, B-frame and P-frame may be applied to any streaming data units that have similar properties to I-frames, B-frames and P-frames, e.g., as described above with respect to the context of streaming video.
Video encoding often takes advantage of the fact that within a given video scene certain elements of the visual content of the scene tend to remain relatively static. It is therefore possible to reduce compress the data needed to encode a video signal by coding a given picture in terms of differences between the picture and a previous picture that is used as a reference. However, if a video sequence contains a scene change there might not be a previous picture that is usable as a reference. It is therefore useful for an encoding program to be able to detect a scene change since the change of scene can have an affect on the encoding process. Previous scene change detection algorithms have been based on analysis of the content of video frames.
It is within this context that embodiments of the invention arise.