In a video compression method, encoded video frames can be categorized, based on how they are encoded, into intra-coded frames (I-frames) and inter-coded frames. Inter-coded frames, based on the prediction direction, can be further categorized into predictive frames (P-frames) and bi-directional predictive frames (B-frames). For I-frames, they are processed using only pixels inside a same frame for prediction (or, for no prediction). The compression achieved thereby is from the spatial relationship of the pixels within the video frame itself. On the other hand, a P-frame is encoded by referencing a previous I- or P-frame for motion estimation, and the generation of motion vectors and residuals. The compression achieved thereby is from the encoding of the motion vectors and residuals. Besides referencing a previous frame, B-frames also reference a subsequent frame for motion estimation. Some video compression standards even conduct prediction for referencing up to previous n frames. H.264 is one of such standards that reference up to previous five frames. For these standards, to encode or decode an inter-coded frame, a buffered-frame storage device of the codec has to store up to n frames. Conventionally, each frame is stored in the buffered-frame storage device using reconstructed data, which requires a large memory size and increases the amount of data accessed of the buffered-frame storage device.
H.264, also known as advanced video coding of MPEG-4 part 10, is the latest video encoding and decoding standard developed by the joint video team (JVT) formed by the video coding experts group (VCEG) of ITU-T and the MPEG group of ISO. The objective of H.264 is to achieve high compression ratios for video applications such as video conferencing, digital storage media, television broadcasting, and internet streaming and communications.
Besides being applied to different applications, H.264 differs from the conventional standards such as H.261, H.263, MPEG-1, and MPEG-2 in the following aspects. H.264 adds intra prediction within intra coding, uses integer transform instead of the discrete cosine transform (DCT), and obtains motion vectors by referencing previous five video frames based on variable-sized blocks. However, as the reference flames are stored in the buffered-frame storage device by their reconstructed data, five frames have to be searched by a H.264 codec so as to locate a mostly resembling reference block for motion estimation during inter coding. This significantly increases not only the memory required for the storage of reference frames, but also the amount of data accessed to the storage device. The H.264 codec therefore consumes a lot of power during its operation.
In 1996, Yogeshwar et al. (U.S. Pat. No. 6,222,886) disclosed a video decoder for compressed video and a related decoding method. As illustrated in FIG. 1, compressed video data are received in a channel buffer 101 and then decompressed by a decoder. A part of the decompressed data is then recompressed again as reference frames. Motion compensation is conducted on a reference frame. At last, regions of interest 103 in the reference frame are decompressed.
The disclosed of Yogeshwar et al. would deteriorate video quality as blocks are re-quantized. The blocks inside the regions of interest 103 are processed to retrieve referenced regions to be decoded.
In 1999, Miller (U.S. Pat. No. 6,633,608) disclosed a method and a device for reducing the memory size and memory bandwidth in a MPEG decoder. FIG. 2 is a flow chart showing the process steps of this conventional method for reducing memory size and bandwidth. As illustrated, within a first mode of operation, the memory is used to store reference frames first (step 201). Afterwards, within a second mode of operation, some of the memory is then used for other purposes (step 210). As the video quality would be deteriorated as the buffered-frame memory is reduced, a frequency domain coding scheme is adopted for compensation. The frequency domain coding scheme is used in conjunction with the buffered-frame memory in the second mode of operation. Besides, the video data are compressed prior to storage and the stored compression data are decompressed prior to utilization. The compression techniques used include down sampling, transforming, etc.
However, the disclosed of Miller didn't address the problem associated with the order of accessing the buffered frames. Miller's approach also causes the video quality to deteriorate, such as the reduction of resolution and Peak Signal-to-Noise Ratio (PSNR). Additionally, both the approaches of Miller and Yogeshwar et al. didn't address the problem of increasing amounts of accessing reference frames. Their approaches didn't cover an adaptive mechanism for determining the storage types of the blocks either.