Field of the Invention
The invention relates in general to a video decoding method and associated apparatus, and more particularly, to a video decoding method and associated apparatus capable of reducing a buffer bandwidth requirement for enhancing performance in video decoders.
Description of the Related Art
Being capable of transmitting and recording diversified audiovisual messages, image data and video encoding techniques thereof have become indispensable constituents in the modern information society. To reduce a file size of image data, the image data must be encoded. Encoding requires memory for storage and processing power for the actual encoding. Thus, the optimization of video encoding performance has come into the spotlight in research and development of information developers in an attempt to increase efficiency during the process of encoding.
Image data are formed by a plurality of serial frames. Each of the frames includes a plurality of pixels, each associated with three component data, e.g., a luma component and two chroma component data in a YUV color space. When video encoding original image data, the frames of the original image data are separately yet pertinently encoded. For example, a frame is encoded to an intra-coded frame (I-frame), a predicted frame (P-frame), or a bidirectional predicted frame (B-frame).
To encode a current frame into a P-frame or a B-frame, one or more reference frames are cited. A situation of citing one reference frame is described as follows. Corresponding to a reference frame, a current frame is divided into a plurality of blocks (e.g., 16*16 macro blocks), each associated with a search window in the reference frame. Each search frame covers a corresponding block and a plurality of neighboring pixels of a peripheral region in the reference frame. When video encoding a current block of the current frame, the luma component data of the current block is compared with the luma component data of the corresponding search window to perform a luma motion estimation to obtain a luma motion vector. According to the luma motion vector as well as the luma component data and the two chroma component data of each pixel in the search window, a motion compensation, including a luma compensation and a chroma compensation, can be performed to obtain a similar block, which is similar to the current block. A residual block is obtained by subtracting the similar block from the current block, and the residual block is further compressed. The compressed residual block and the motion vector constitute an encoding result, for representing the current block. A situation of citing a plurality of reference frames can be deduced similarly.
From perspectives of video decoding, when video decoding an encoded current frame, by referring to motion vectors corresponding to blocks in a reference frame and the current frame, corresponding similar blocks can be obtained for each of the blocks of the current frame. An original (unencoded) frame can be reconstructed by combining the similar blocks and residual blocks corresponding to each of the blocks of the current frame.
When implementing encoding techniques, a buffer (e.g., a frame buffer) must be utilized for storing reference frames. FIG. 1 shows a schematic diagram of accessing a buffer 10 for video encoding in a prior art. To perform video encoding, two memory regions 12a and 12b in the buffer 10 are allocated to a reference frame. The luma component data of the reference frame are stored in the memory region 12a, and the other two chroma component data of the reference frame are stored in the memory region 12b. Referring to FIG. 1, the luma component data and the two chroma component data of the reference frame in a block (e.g., 16*16 macro blocks) are depicted. Under a 4:2:0 video encoding format, each block is associated with 16*16 luma component data Y, 8*8 chroma component data U, and 8*8 chroma component data V. Each luma component data Y includes 8 bits (one byte), and each of the chroma component data U and V also respectively include 8 bits (one byte). In the memory region 12a, the 16*16 luma component data Y of each block are stored in 16 adjacent bytes of 16 adjacent rows; in the memory region 12b, the 8*8 chroma component data U and the 8*8 chroma component data V of each block are stored in 8 adjacent rows, with each row being alternately storing the chroma component data V and the chroma component data U. That is to say, the chroma component data V and the chroma component data U are alternately stored as one column of chroma component data U followed by one column of chroma component data V, as shown in FIG. 1. To perform the luma motion estimation and the luma motion compensation, the prior art accesses (e.g., loads) the memory region 12a; to perform the chroma compensation, the prior art accesses the memory region 12b. 
One of the shortcomings of the prior art is that a large bandwidth (i.e., a data amount within a time unit) of the buffer 10 is required. When performing video decoding, the prior art in FIG. 1 needs to access more data from the memory region 12a (e.g., compared to the data in the memory region 12b) to perform the luma motion estimation. Therefore, a quite large bandwidth is required for accessing the memory region 12a of the buffer 10 if completing the luma motion estimation within the same period of time is required. The requirement of a large bandwidth hinders implementation and promotion of video decoding techniques. In addition, for high-resolution images having an even greater number of blocks, the above shortcoming of the prior art is further emphasized.