Television signals with a scanning rate of 30 frames per second contain a significant amount of frame-to-frame redundancy. For video conferencing applications, in particular, motion in a scene is usually low so that the frame to frame data redundancy is high. In such a case, coding techniques can be used to reduce the frame to frame data redundancy and achieve a high degree of data compression.
Consider first and second successive video frames arising in a video conferencing application. Illustratively, the difference between the two successive frames results from motion of an object such as the arm or hand of a person. If this motion is confined to a small region of the frame and if the first frame is available at the receiver, then instead of transmitting the entire second frame to the receiver, it is only necessary to transmit the motion information to the receiver. Upon receipt of the motion information, the receiver is able to construct the second frame from the first. In particular, the motion information is obtained by estimating the displacement of the moving object between the second frame and the first frame. The estimated displacement is then transmitted to the receiver so that the receiver can construct the second frame from the first.
Using this method of transmitting video data, it is necessary to transmit the full frame data for only one out of every two frames so that a significant amount of data compression is achieved. This technique of data compression is called motion compensation and plays an important role in various video codecs.
Several methods for estimating the displacement of an object in a video sequence have been proposed. Generally, they can be classified into two types, pixel recursive algorithms (see. e.g. A. N. Netravali et al. "Motion Compensated Television Coding: Part I" BSTJ Vol. 58, pp. 631-670, Mar. 1979; and K. A. Probhu et al. "Pel-Recursive Motion Compensated Color Codes", Proceedings of ICC 82 pp. 2G.8.1-2G.8.5, Philadelphia, PA, June 1982) and block matching algorithms (see e.g. J. R. Jain et al. "Displacement Measurement and Its Application in Interframe Image Coding" IEEE Trans on Commun., Vol. COM-29, pp. 1799-1808, Dec. 1981). Here, the concern is with block matching algorithms.
In a block matching algorithm, the current (i.e. the second) frame is divided into blocks of pixels. For example, if a frame is 256.times.256 pixels, it may be divided into two hundred and fifty-six MxN blocks where M and N are both 16 pixels. The purpose of the block matching algorithm is to obtain a displacement vector for each block of pixels in the current frame. A displacement vector indicates the displacement of a block relative to its location in the previous (i.e. the first) frame. These displacement vectors are then transmitted to the receiver so that the receiver can construct the current (i.e. the second) frame from the previous (i.e. the first) frame. In applications with relatively low motion levels, such as a video conference, many of the displacement vectors are zero.
To calculate a displacement vector for a block of pixels in the current frame, a similarity calculation is performed between the block of the current frame and each of a plurality of equal sized blocks laid out in a generally larger search area of the previous frame. The block of pixels in the current frame and the search area in the previous frame generally have the same center. An error function such as the mean absolute error or mean square error is calculated as a similarity measurement for each position of the block of the current frame in the search area. The displacement vector is the displacement between the center of the search area and the center of the block in the search area which yields the minimum error when compared with the block from the current frame.
More particularly, a search area in a previous frame is searched by placing a block of pixels from the current frame at the upper left-hand corner of the search area and calculating the error (mean square or mean absolute) with respect to the overlapped pixels in the search area. The block from the current frame is then moved pixel by pixel to the right-hand boundary of the search area. At each step the error with respect to the overlapped pixels of the search area is calculated. The block of the current frame is then moved down one row of pixels in the search area, and the block is again moved pixel by pixel from the left-hand boundary of the search area to the right-hand boundary, at each step the error with respect to the overlapped pixels of the search area being calculated. The block of pixels from the current frame is then moved down another row and moved from left to right pixel by pixel etc. This process is continued until an error function (mean square or mean absolute) is calculated for all possible block positions in the search area. (Hence the name, full search block matching algorithm). The calculated mean errors are compared and the block position that produces the minimum error defines the displacement vector for the block.
The full search block matching procedure described above demands a very large amount of computation. Consider, for example, the specifications:
image size=256.times.256 pixels; PA1 frame rate=15 frames per second; PA1 block size in current frame=MxN where M=16 and N=16; and PA1 search area size in previous frame=(M+2T)(N+2T) where PA1 T=8 pixels;
The full search procedure for each block in the current frame requires (2T+1).sup.2 =289 distinct mean error calculations involving the block of pixels from the current frame and an overlapping block of pixels from the search area. Each such error calculation requires 255 additions since there are 256 pixels in each block. To accomplish the task for the entire current frame of 256 blocks in 1/15 seconds, an 8-bit wide adder must perform additions in 3.5 nanoseconds if the additions are done sequentially. This is a very severe speed requirement. If the motion compensation techniques were to be applied to an NTSC signal comprising 512.times.512 pixels (rather than 256.times.256 pixel example as described above) the speed of the adder would have to be 0.4375 ns, a speed clearly unapproachable using currently available technology. In addition, the access time of the memories storing the pixel values is of same order as the adder time. Such access time cannot be readily achieved with current technology.
In light of the severe computational demands of the full search block matching procedure, some fast but non-full-search algorithms have been proposed. (See e.g. J. R. Jain et al. "Displacement Measurement and Its Application in Interframe Image Coding", IEEE Trans on Commun., Vol. Com-29, pp. 1799-1808 Dec. 1981; and T. Koga et al. "Motion Compensation Interframe Coding for Video Conferencing", NTC 81 Proc. pp, G5.3.1-G5.3.5, New Orleans, LA, Nov. 29-Dec. 3, 1981.) Although these simplified, but non-full, searching methods reduce the computational burden, they do not provide optimal implementation of the block matching algorithm. It is recognized that the performance of the full search block matching procedure is the best among all block matching search procedures. For low-bit rate (e.g. 64-128 kb/sec) video, the difference in performance between the full search and non-full search algorithms can be significant.
In Roth et al., "A VLSI for Motion Compensation", pp. 13.1-13.2 PCS 87, a VLSI chip for implementing a full search block matching algorithm is disclosed. The chip is designed for processing 8.times.8 blocks. The data flow within the chip requires a lot of buffer capacity which causes the chip to have a relatively large size. This prevents the design of the Roth et al. reference for being practical for processing a block larger than 8.times.8 pixels. If a chip that is capable of processing 16.times.16 blocks is needed, then the design in the Roth et al. reference results in a chip having a size that is beyond reasonable cost and which may also involve unacceptable processing delays.
Accordingly, it is an object of the present invention to provide a circuit, implementable in VLSI, for carrying out a full search block matching algorithm for the compression of video data. It is a further object of the present invention to provide a VLSI circuit of reasonable size and cost for implementing a block matching algorithm, which algorithm can efficiently handle both 8.times.8 and 16.times.16 pixel blocks.