The present invention relates to a method and apparatus for encoding video signals, and more particularly, to a method and apparatus for adaptively adjusting the encoding type of video data to be encoded depending upon an available memory bandwidth for the video encoder.
As the operation speed of electronic components increases, video data processing, which consumes considerable computing resource of a system, becomes an important subject. In general, the video data processing standard provides various encoding types to support different requirements of video processing. For example, in the MPEG (moving picture experts group) 2 standard, a picture is compressed by eliminating spatial redundancies and temporal redundancies.
In general, there are some spatial similarities in chromatic, geometrical, or other characteristic values within a picture/image. In order to eliminate these spatial redundancies, it is required to identify significant elements of the picture and to remove the redundant elements that are less significant. For example, according to the MPEG 2 standard, a picture is compressed by eliminating the spatial redundancies by chrominance sampling, discrete cosine transform (DCT) and quantization. In addition, video data is actually a continuous series of pictures, which are perceived as a moving picture due to the persistence of pictures in the vision of human eyes. Since the time interval between pictures is very short, the difference between neighboring pictures is generally little and mostly appears in change of location of visual objects. Therefore, the MPEG 2 standard eliminates temporal redundancies caused by the similarity between pictures to further compress the video data.
In order to eliminate the temporal redundancies mentioned above, a process referred to as motion compensation is employed in the MPEG 2 standard. Before performing motion compensation, a current picture to be processed is typically divided into 16×16 pixel sized macroblocks (MB). For each current macroblock, a most similar prediction block of a reference picture is then determined by comparing the current macroblock with “candidate” macroblocks of a preceding picture or a succeeding picture. The most similar prediction block is treated as a reference block and the location difference between the current block and the reference block is then recorded as a motion vector. The above process of obtaining the motion vector is referred to as motion estimation. If the picture to which the reference block belongs is prior to the current picture, the process is called forward prediction. If the reference picture is posterior to the current picture, the process is called backward prediction. In addition, if the motion vector is obtained by referring both to a preceding picture and a succeeding picture of the current picture, the process is called bi-directional prediction. A common employed motion estimation method is a block-matching method. Additionally, because the reference block are probably not completely the same as the current block, it is required to calculate the difference between the current block and the reference block, which is also referred to as a prediction error. The prediction error is used for decoding the current block.
The MPEG 2 standard defines three encoding types to encode a picture: intra encoding, predictive encoding, and bi-directionally predictive encoding. An intra-coded picture (I-picture) is encoded independently without referring to a preceding picture or a succeeding picture. A predictive picture (P-picture) is encoded by referring to a preceding picture, where the preceding picture could be an I-picture or a P-picture. A bi-directionally predictive picture (B-picture) is encoded by referring to both a preceding picture and a succeeding picture.
Since no neighboring pictures are referred to when encoding an I-picture, less memory bandwidth is required. A reference picture is referred to when encoding a P-picture, so more memory bandwidth is required. Two reference pictures are referred to when encoding a B-picture, and therefore even more memory bandwidth is required.
A picture is composed of a plurality of macro-blocks, and the picture is encoded macro-block by macro-block. Each macro-block has a corresponding motion type parameter representing its motion compensation type. In the MPEG 2 standard, for example, each macro-block in the I-picture is intra-coded. The P-picture includes intra-coded and forward motion compensated macro-blocks. The B-picture may include intra-coded, forward motion compensated, backward motion compensated, and bi-directional motion compensated macro-blocks. As is well known in the art, an intra-coded macro-block is independently encoded without referring to other macro-blocks in a preceding picture or a succeeding picture. A forward motion compensated macro-block is encoded by referring to the forward prediction information of a most similar macro-block in the preceding picture. A bi-directional motion compensated macro-block is encoded by referring to the forward prediction information of a reference macro-block in the preceding picture and the backward prediction information of another reference macro-block in the succeeding picture.
As mentioned above, since the intra-coded macro-block is encoded without referring to other macro-blocks in the preceding picture or the succeeding picture, the required memory bandwidth while encoding an intra-coded macro-block is lower than other types of macro-block. While encoding the forward motion compensated macro-block, it is required to refer to a reference macro-block in the preceding picture so more memory bandwidth is required. Similarly, while encoding the bi-directional motion compensated macro-block, it is required to refer to both a reference macro-block in a preceding picture and another reference macro-block in a succeeding picture so it requires even more memory bandwidth.
In addition, the required memory bandwidth while encoding a macro-block not only relates to the encoding type but also relates to the motion estimation method employed thereof.
A conventional block-matching process of motion estimation is shown in FIG. 1. A current picture 120 is divided into blocks as shown in FIG. 1. Each block can be of any size. In the MPEG standard, for example, the current picture 120 is typically divided into macro-blocks with 16×16 pixels. Each block in the current picture 120 is encoded in terms of its difference from a block in a preceding picture 110 or a succeeding picture 130. During the block-matching process of a current block 100, the current block 100 is compared with similar-sized “candidate” blocks within a search range 115 of the preceding picture 110 or within a search range 135 of the succeeding picture 130. The candidate block of the preceding picture 110 or the succeeding picture 130 that is determined to have the smallest difference with respect to the current block 100, e.g. a block 150 of the preceding picture 110, is selected as a reference block. The motion vectors and residues between the reference block 150 and the current block 100 are computed and coded. As a result, the current block 100 can be restored during decompression using the coding of the reference block 150 as well as the motion vectors and residues for the current block 100.
A block-matching algorithm that compares the current block to every candidate block within the search range is called a “full search block-matching algorithm.” In general, a larger search area produces a better motion vector. However, the required memory bandwidth of a full search block-matching algorithm is proportional to the size of the search area. For example, if a full search block-matching algorithm is applied on a macroblock of size 16×16 pixels over a search range of ±N pixels with one pixel accuracy, it requires (2×N+1)2 block comparisons. For N=16, 1089 16×16 block comparisons are required. Because each block comparison requires 16×16, or 256 calculations, this algorithm consumes considerable memory bandwidth and is computationally intensive.
FIG. 2 depicts a simplified schematic diagram of a video encoding/decoding system 10 according to the related art. The encoding/decoding system 10 includes a video encoder 12, a video decoder 14, an audio codec 16, a control unit 18, a memory management unit 20, a display controller 22, and a memory 24. As shown in FIG. 2, in the conventional encoding/decoding system 10, all components share the memory 24 via the memory management unit 20. Each of the video encoder 12, the video decoder 14, and the audio codec 16 would need to access the memory 24 via the memory management unit 20. The system memory could be an external DRAM located outside the video encoding device 10, and the memory management unit 20 could be a DRAM controller for accessing the external DRAM.
In order to overcome the disadvantages of the full search block-matching algorithm, a three-step hierarchical search algorithm is described by H. M. Jong et al. in “Parallel Architectures for 3-Step Hierarchical Search Block-Matching Algorithm,” IEEE Trans. On Circuits and Systems for Video Technology, Vol. 4, August 1994, pp. 407-416. In a first stage, a coarse search is performed over a reasonably large area. In successive stages of a conventional hierarchical search, the size of the search area is reduced. For the same search range, the hierarchical search method uses less memory bandwidth and computations than the full search method but results in deterioration of the image quality.
In U.S. Pat. No. 5,731,850, Maturi et al. disclose a combination of the full search method and the hierarchical search method. The search range is determined according to the encoding type of the current picture. If the search range is larger than a threshold, the hierarchical search method is adopted to implement the block-matching process. If the search range is less than or equal to the threshold, the full search method is adopted to implement the block-matching process. In this way, the image quality and system performance are thereby balanced. Note that in the method of Maturi et al., the way to perform the block-matching process is dependent upon the size of the search range. However, if the system memory bandwidth is limited, when the required memory bandwidth of the video encoder is dramatically increased, the method of Maturi et al. cannot guarantee that the required memory bandwidth does not exceed the available memory bandwidth for the video encoder. As a result, the encoding performance is therefore decreased and the requirement of real-time encoding may not achieved due to the system memory not being capable of supporting such a high memory bandwidth.