1. Field
The field of the invention is directed to encoding, more particularly the field relates to the removal of temporal redundancy achievable in encoding video through adaptively assigning frame types.
2. Background
In the 1990s television technology switched from using analog methods for representing and transmitting video to digital methods. Once it was accepted that the existing solid state technologies would support new methods for processing video, the benefits of digital video were quickly recognized. Digital video could be processed to match various types of receivers having different numbers of lines and line patterns that were either interlaced or progressive. The cable industry welcomed the opportunity to change the bandwidth-resolution tradeoff virtually on the fly, allowing up to twelve video channels or 7-8 channels of digital video that had superior picture quality to be transmitted in a bandwidth that formerly carried one analog channel of video. Digital pictures would no longer be affected by ghosts caused by multipath in transmission.
The new technology offered the possibility of high definition television (HDTV), having a cinema-like image and a wide screen format. Unlike the current aspect ratio that is 4:3, the aspect ratio of HDTV is 16:9, similar to a movie screen. HDTV can include Dolby Digital surround sound, the same digital sound system used in DVDs and many movie theaters. Broadcasters could choose either to transmit a high resolution HDTV program or send a number of lower resolution programs in the same bandwidth. Digital television could also offer interactive video and data services.
There are two underlying technologies that drive digital television. The first technology uses transmission formats that take advantage of the higher signal to noise ratios typically available in channels that support video. The second is the use of signal processing to remove unneeded spatial and temporal redundancy present in a single picture or in a sequence of pictures. Spatial redundancy appears in pictures as relatively large areas of the picture that have little variation in them. Temporal redundancy refers to parts of a picture that reappear in a later or earlier picture. By eliminating it, the duplication of data contained in successive pictures is minimized. Elimination is achieved by sending motion vector information—i.e., a pointer to a block of data—that will become a reference for the block being considered, and information that is indicative of the differences between the reference block and the block being processed to the video decoder.
Temporal compression is defined by the type of motion prediction allowed by the standard. MPEG2 allows three types of pictures: I (Intracoded) pictures, P (Predictively Coded), and B (Bidirectionally Coded) pictures. H.264 adds the “skip frame” encoding option.
I pictures are encoded on a stand alone basis. The MPEG2 standard allows I pictures to be compressed using spatial compression alone, though in more recent H.264 standard prediction within the I picture is allowed.
P pictures are predicted directly from the I picture or P picture that preceded it in time although individual blocks in a P picture can always be encoded on a stand alone basis using spatial compression. This option may be chosen if the best prediction block does not adequately reduce the number of bits needed to represent a P frame block.
References for B picture blocks are derived from the I or P pictures that straddle the frame to be encoded in time. For the MPEG2 standard, only 2 pictures, one ahead, one behind in time could to be used as a source of reference blocks for bidirectional prediction. In the H.264 standard up to 32 pictures can be used to find sources of reference blocks.
There is a need for innovative systems for determining how to appropriately assign an encoding type for a frame.