1. Field of the Invention
This invention pertains generally to video motion compensation, and more particularly to a method of providing overlapped motion compensation of video by utilizing a triple-buffering method.
2. Description of the Background Art
Motion compensation of video streams, such as video recordings, is important for reducing spurious movement which detract from the professional appearance of the resultant video. One traditional method of obtaining motion compensation is by utilizing overlapped motion compensation. Typically, overlapped motion compensation is performed with MPEG-4 bit streams wherein the flag obmc_disable=0, or when the optional “advanced prediction mode” is enabled within an H.263 bit stream. Each of the pixels within an eight-by-eight (8*8) luminance prediction block is provided as a weighted-sum of three prediction values which are divided by eight (8), and preferably subject to rounding. To arrive at three prediction values, three motion vectors are generally utilized, the motion vector of the current luminance block, as well as two out of four “remote” vectors. The remote vectors may comprise the motion vector at either the left or right side, or above and below, the current luminance block.
To obtain a value for each pixel, the remote vectors of the block at the two nearest block borders are utilized. For example, within the upper half of the block the motion vector corresponding to the block above the current block is selected for use, while for the lower half of the block, the motion vector corresponding to the block at the left side of the current block is utilized.
The generation of each pixel p(i,j) in an 8*8 luminance prediction is governed by the following equation: p(i,j)=((q(i,j)×H0(i,j))+(r(i,j)×H1(i,j))+(s(i,j)×H2(i,j))+4)//8wherein q(i,j), r(i,j), and s(i,j) are pixels from the referenced picture defined by:q(i,j)=p(i+MVx0, j+MVy0)r(i,j)=p(i+MVx1, j+MVy1)s(i,j)=p(i+MVx2, j+MVy2)In the equations above, (MVx2, MVy0) denote the motion vector from the current block, and (MVx1, MVy1) denotes the motion vector of the block either above or below the current block, while (MVx2, MVy2) denotes the motion vector for either to the left or the right of the current block. The matrices H0(i,j), H1(i,j), and H2 (i,j) are defined within both the MPEG-4 and the H.263 coding standards.
FIG. 1 illustrates a multiprocessing video decoder comprising a first processor for a sequential task and a transfer buffer which connects to one or more processors for additional tasks. This multiprocessing architecture separates data dependent operations, which are to be performed sequentially, from the data independent operations which may be performed on a macroblock basis. According to the multiprocessor paradigm, performance is subject to the efficiency with which the data is transferred between processors. The data being transferred is structured in transfer units, such as block level transfers. It will be appreciated that the data is typically transferred utilizing conventional double-buffering methods which was described for the multiprocessing architecture incorporated by reference.
Each data transfer executed between processors reduces the amount of processor bandwidth available for task execution. It should be appreciated that data transfers made between processing elements are typically expensive in relation to the amount of processing time which is utilized. The expense of the transfer is exacerbated when the buffer is implemented with slow-memory devices in order to reduce the implementation costs, since the access times of the memory is slow in comparison with the time required to perform a general arithmetic operation. Conventional double-buffering is typically utilized to eliminate the delay caused by accessing the memory device(s) of the buffer.
FIG. 2 illustrates the use of double-buffering of transfers between processors within the referenced multiprocessing decoder architecture. The double-buffer mechanism comprises a first buffer and a second buffer. The decoded data MB(x) at time x along the time axis, can be transferred to a transfer buffer from which the data is delivered to the multiple processing elements. It will be appreciated that while the transfer from the second buffer is occurring, the processor can continue processing the next macroblock because the first buffer is available for use. Therefore, the time interval over which instructions may be processed is not subject to the time lost during the transfer from the buffer.
Conventional decoder systems utilize double-buffering mechanism, however, these can suffer from performance losses when operating in an overlapped motion compensation mode. It has been determined that one of the reasons for the slow performance within existing systems is a result of the need for a subsequent, or future, macroblock in order to process the vector information.
FIG. 3 depicts a macroblock comprising four block positions, one through four.
FIG. 4 depicts the relationship within a current macroblock being decoded and four neighboring macroblocks. The values MVu, MVl, MVr, and MVd represent the motion vectors, wherein MVu is in block three of the upper macroblock, MVl is in block two of the left macroblock, MVr is in block two in the current macroblock, and MVd is in block three of the current macroblock. The motion vector of block one is in the current macroblock is represented by MVc. It will be appreciated, as depicted within the figure, that all motion vectors are available at the time the header information of the current macroblock is decoded since it does not refer to any future macroblocks. As a result, overlapped motion compensation can be executed immediately after decoding the current macroblock.
FIG. 5 and FIG. 6 depict two situations wherein a future motion vector is required in order to perform overlapped motion compensation within the current macroblock. The block MVr is carried to the next macroblock in the right-side of the spatial domain. It will be appreciated that since the right-side macroblock is not being decoded, the information is not available.
FIG. 7 and FIG. 8 illustrate a few drawbacks associated with the use of double buffering within the multiprocessing architecture described. As a macroblock is being decoded at (x+1) in FIG. 7, the decoded macroblock information at x is unable to be transferred because the next motion vectors have not yet been decoded. Therefore, the motion vectors of the macroblock at (x+1) are inserted within the macroblock data. Upon decoding the subsequent motion vector, the data in buffer 2 is transferred as shown in FIG. 8. It will be appreciated, however, that buffer 1 cannot be updated as it has not yet been transferred and the processor cannot commence decoding the next macroblock as there is not an available buffer.
Therefore, a need exists for a buffering mechanism for use with overlapped motion compensation that doesn't create a performance bottleneck without adding undue complexity to the video decoder as outlined above. The present invention satisfies those needs, as well as others, and overcomes the deficiencies of previously developed.