Programmable logic devices (“PLDs”) are a well-known type of integrated circuit that can be programmed to perform specified logic functions. One type of PLD, the field programmable gate array (“FPGA”), typically includes an array of programmable tiles. These programmable tiles can include, for example, input/output blocks (“IOBs”), configurable logic blocks (“CLBs”), dedicated random access memory blocks (“BRAMs”), multipliers, digital signal processing blocks (“DSPs”), processors, clock managers, delay lock loops (“DLLs”), and so forth. Notably, as used herein, “include” and “including” mean including without limitation.
One such FPGA is the Xilinx Virtex™ FPGA available from Xilinx, Inc., 2100 Logic Drive, San Jose, Calif. 95124. Another type of PLD is the Complex Programmable Logic Device (“CPLD”). A CPLD includes two or more “function blocks” connected together and to input/output (“I/O”) resources by an interconnect switch matrix. Each function block of the CPLD includes a two-level AND/OR structure similar to those used in Programmable Logic Arrays (“PLAs”) and Programmable Array Logic (“PAL”) devices. Other PLDs are programmed by applying a processing layer, such as a metal layer, that programmably interconnects the various elements on the device. These PLDs are known as mask programmable devices. PLDs can also be implemented in other ways, for example, using fuse or antifuse technology. The terms “PLD” and “programmable logic device” include but are not limited to these exemplary devices, as well as encompassing devices that are only partially programmable.
For purposes of clarity, FPGAs are described below though other types of PLDs may be used. FPGAs may include one or more embedded microprocessors. For example, a microprocessor may be located in an area reserved for it, generally referred to as a “processor block.”
More recently, FPGAs have been used for video processing. In order to more conveniently send video over limited bandwidth networks, video compression is used. There are many known types of video compression, including that associated with the Motion Pictures Expert Group (“MPEG”) among others. However, for purposes of clarity by way of example and not limitation, MPEG terminology is used.
In video compression, motion compensation may be used. Generally, a video sequence includes a number of pictures or frames. Frames in a sequence may be substantially similar, and thus contain a significant amount of redundant information. In video compression, this redundant information may effectively be removed by using a reference frame and a number of residual frames. As residual frames are indexed to a reference frame, they may contain less information than the reference frame. Accordingly, they may be encoded at a lower bit rate with the same quality as associated original frames from which they were obtained.
Another approach is to approximate motion of an entire scene and objects in a video sequence. Motion may be described by parameters which are encoded in a compressed video bitstream. Pixels of a predicted frame are approximated by translated pixels of a reference frame for motion estimation. Although this form of motion estimation may produce higher quality residual frames than the above-described motion compensation approach of subtracting differences between frames, the bit rate occupied by the parameters of this type of motion estimation may be significantly large.
In MPEG, frames are processed in groups. One frame, often the first frame of a group of frames, is encoded without motion compensation as a reference frame. This reference frame, which is an intracoded frame (“I-frame” or “I-picture”) is combined with predicted frames (“P-frames or P-pictures”). One or more P-frames may be predicted from a preceding I-frame or P-frame.
Furthermore, frames may be predicted from future frames. Such predicted frames from future frames may be predicted from two directions, such as for example from an I-frame and a P-frame that respectively immediately precede and follow the bidirectionally predicted frame. Conventionally, bidirectionally predicted frames are called “B-frames” or “B-pictures”. Other known details regarding MPEG video encoding are not described, as they are well known.
In block motion compensation (“BMC”), frames are partitioned into blocks, each of which is an array of pixels, sometimes referred to as “macroblocks.” Groups of macroblocks, where each group is associated with a frame, are known as “slices.” Each block is predicted from a block of equal size in a reference frame. Blocks are not transformed apart from the original frame other than being shifted to a position of a predicted block. This shift is represented as a motion vector. Such motion vectors are thus encoded into a video compressed bitstream.
Motion vectors need not be independent, for example if two neighboring blocks are associated with the same moving object, they may be differentially encoded to save bit rate. Accordingly, the difference between a motion vector and one or more neighboring motion vectors may be encoded. An entropy encoder/decoder (“CODEC”) may exploit the resulting statistical distribution of motion vectors, such as around a zero vector, for encoding or decoding video.
Blocks may be shifted by integer or non-integer vectors. With respect to non-integer vectors this is generally referred to as sub-pixel precision. Sub-pixel precision conventionally involves interpreting pixel values. To avoid discontinuities introduced at block borders, generally referred to as block artifacts, variable block-size motion compensation (“VBSMC”) may be used. VBSMC is BMC with the ability for an encoder to dynamically select block size. When encoding video, use of larger blocks may reduce the number of bits used to represent motion vectors. However, the use of smaller blocks may result in a smaller amount of prediction residual information to encode.
Furthermore, overlapped block motion compensation (“OBMC”) may be used to increase prediction accuracy and avoid or reduce blocking artifacts. OBMC blocks may be significantly larger in each dimension and overlap quadrant wise with neighboring blocks. However, for OBMC, each pixel may belong to multiple blocks, and thus there are multiple predictions for each such pixel which may be summed up to a weighted mean. Accordingly, such blocks may be associated with a window function having a property which is the sum of overlapped windows.
Block motion estimation or overlapped block motion estimation (“BME” or “OBME”, respectively) may be used to find an optimal or near optimal motion vector. The amount of prediction error for a block may be measured using a sum-of-absolute-differences (“SAD”) between predicted and actual pixel values over all pixels associated with a motion compensated region, which may be associated with a slice. Basically, optimal or near optimal motion vectors are calculated by determining block prediction error for each motion vector within a search range, and selecting the block prediction error that effectively has a best compromise between the amount of error and the number of bits needed for motion vector data.
A motion estimation calculation which tests all possible motion representations or blocks for such a search range is generally referred to as a full search optimization. A less time consuming approach than a full search optimization, though it is suboptimal with respect to rate distortion, involves use of a coarse search grid for a first approximation followed by refinement of such coarse search grid for areas surrounding this first approximation in one or more subsequent steps for producing one or more second approximations.
A more computationally intensive and higher image quality form of BME than SAD is to determine the sum-of-square differences (“SSD”). It should be appreciated that motion estimation may be substantially calculation intensive. The number of calculations may vary with the resolution of the image. For example, for High-Definition television (“HDTV”) there may be approximately two million pixels in a frame where each pixel is motion estimated with blocks of a block size of 16 pixels by 16 pixels. Furthermore, this calculation is done for viewing at 30 frames per second (“fps”). For an SSD value, namely Σ(Aj−Bi)2 where i goes from 0 to n, each Bi is compared one at a time with an Aj. Aj is for a macroblock, and Bi is for a block of pixels in a reference image. There may be 0 to m macroblocks in a slice, and thus j may be from 0 to m. For a conventional SSD implementation, squaring adds a significantly complex, as well as resource costly, multiplier stage.
Accordingly, it would be desirable and useful to provide SSD quality motion estimation. Furthermore, it would be desirable and useful to provide such motion estimation that would be reasonable for implementation in an FPGA or other integrated circuit using DSP blocks.