Video coding/decoding systems find widespread application in many communication environments. They typically capture audio-visual content at a first location, code the content according to various bandwidth compression operations, transmit the coded content to a second location and decode the content for rendering at the second location. For video content, coding and decoding typically exploits temporal and spatial redundancies in the content, using motion compensated prediction techniques to reduce bandwidth of the coded signal.
Motion compensation techniques involve prediction of a new input frame using one or more previously-coded frames as a basis for the prediction. Video coders and decoders both store decoded versions of select frames that have been designated as “reference frames.” When a new input frame is to be coded according to motion compensation technique, an encoder searches among the reference frame for content that closely matches content of the input frame. When a match is found, the encoder typically identifies the matching reference frame to the decoder, provides motion vector that identify spatial displacement of the matching content with respect to the input content and codes residual data that represents a difference between the input data and the matching content of the reference frame. A decoder stores the same set of reference frames as does the encoder. When it is provided with identifiers of reference frames, motion vectors and coded residual data, the decoder can recover a replica of each input frame for display. Frames typically are parsed into spatial arrays of data (called “pixel blocks” herein) and motion vectors and coded residual data may be provided for each pixel block of the input frame.
Motion compensated prediction, therefore, requires that video coders and decoders both store a predetermined number of reference frames for use in coding and decoding. Modern coding protocols, such as H.263 and H.264 define predetermined limits on the number of reference frames that are to be stored at encoders and decoders. Thus, encoders and decoders typically are provided with a cache that stores only a predetermined number reference pictures. During operation, if a reference picture cache stores the maximum number of reference pictures and a new reference picture is to be added, then a previously-stored reference picture will be evicted from the cache to accommodate the new reference picture. The evicted reference cannot thereafter be used as a basis for predicting new input frames.
The limited depth of reference picture caches is unsatisfactory for many coding applications. In applications where image content may include moving foreground content over a relatively static background, background elements are likely to have very high temporal redundancy and can be coded efficiently. However, if a foreground element obscures a background element for such a long duration that the reference picture cache has evicted any reference frames that include the background element, a video coder will be unable to code it predictively if the foreground element moves again and the formerly-obscured background element is revealed.
Accordingly, there is a need in the art for a predictive coding system that effectively extends the reach of motion compensated prediction techniques to include content of reference pictures that have been evicted from encoder and decoder caches.