This invention relates to digital-video compression systems, and more particularly to motion-estimation calculators.
Personal computers, digital-video cameras, and newer consumer electronic devices commonly use video in a digital format. Since a huge amount of memory or storage space is required to fully store all 30 or more frames per second of video, the images are usually compressed. Often sequential images in the video sequence differ only slightly. The difference from a previous (or following) image in the sequence can be detected and encoded, rather than the entire picture. Such lossy compression techniques are widely used, such as MPEG encoding.
During compression or encoding, each frame or image is divided into a grid of macroblocks. Each macroblock contains 16xc3x9716 pixels. A macroblock from a current frame or picture is compared to a range of macroblocks in a previous picture in the video sequence. Often a match or near-match is found at a different location. The difference in locations is known as a motion vector, since it indicates the movement of the macroblock between the two pictures. The motion vector rather than the entire macroblock can then be encoded for the new picture, saving storage space.
FIG. 1 illustrates motion estimation for a pair of digital-video pictures. A current picture 10 is compared to an old picture 12 in a video sequence. Old picture 10 could occur either before or after current picture 10 in the sequence when backward and forward estimation are used.
Macroblock 16 in current picture 10 is selected and compare to all macroblocks within search range 14 in old picture 12. A match or near-match is found with macroblock 16xe2x80x2. because of movement of subjects in the pictures, macroblock 16xe2x80x2 from old picture 12 has moved to a new location in current picture 10. The difference in locations of macroblocks 16, 16xe2x80x2 is indicated by motion vector 18.
Rather than store all 16xc3x9716 pixels of macroblock 16 in the encoded video stream, only motion vector 18 and an identifier for macroblock 16xe2x80x2 need to be included. This reduces or compresses the size of the video stream.
Although the video stream is compressed, large numbers of calculations are needed for motion estimation. The various macroblocks within a search range are usually evaluated by a sum of the absolute difference (SAD) method. The macroblock 16xe2x80x2 with the smallest SAD is the closest match to the macroblock 16 being searched.
For a search range of +/xe2x88x92127 and +/xe2x88x9263 pixels, a total of 32K macroblocks are evaluated, requiring 32K SAD operations. Each SAD operation requires 256 (16xc3x9716) subtractions, 256 absolute-value operations, and 255 2-input additions, a total of 767 arithmetic operations. A full search for one macroblock thus requires 32Kxc3x97767 or about 24M calculations.
A 720xc3x97480-pixel picture has 1350 macroblocks, each of which may move independently and thus must be motion-estimated. So a total of 1350xc3x9724M or 32G calculations are needed per picture. For a video having 30 frames per second, about 1 trillion operations per second are needed (1T ops/sec). Thus full motion estimation requires large computing resources.
While a general-purpose arithmetic-logic-unit (ALU) or digital-signal processor (DSP) can be used for performing the SAD operations, a specialized calculator for these SAD operations is desirable. A specialized SAD calculator that reduces the processing time of the frequent SAD operations would improve compression performance.
Sum-of-Absolute-Difference Calculationsxe2x80x94FIG. 2
FIG. 2 shows a straightforward approach to performing the Sum-of-Absolute-difference (SAD) operation. When comparing 2 macroblocks for motion estimation, a measure of the visual differences between the 2 macroblocks is obtained by comparing pixels. Different numeric values of the pixels are used to represent different colors and shades. Each pixel in one macroblock can be compared to the corresponding pixel in the other macroblock by subtracting the numeric values of the 2 pixels. When the pixels are identical, the difference is zero. Pixels that differ only slightly in color will have a small difference. Since the difference can be either positive or negative, the absolute value of the difference is taken.
The difference and absolute values of all other pairs of pixels in the 2 macroblocks are also generated. Finally, all of the absolute differences are summed to obtain one data value that represents the overall difference between the 2 macroblocks. The closer this sum is to zero, the better the 2 macroblocks match. When many possible macroblock pairs are compared in this way, the macroblock with the smallest sum is the best match. The motion vector is then the location difference between the 2 macroblocks.
In FIG. 2, macroblock A with pixels A1, A2, A3, A4 . . . is being compared to macroblock B with pixels B1, B2, B3, B4 . . . . Pixel A1 corresponds to pixel B1, pixel A3 to B3 etc, since each pair of pixels is in the same relative location within the macroblocks being compared. Subtractors 20 each receive an A pixel value and a B pixel value and generated the difference. A full adder with a 2""s complementor can be used for subtractor 20 if the 2""s complement of the B pixel is taken prior to its being input to the adder.
Subtractors 20 thus generate the pixel differences A1-B1, A2-B2, A3-B3, and A4-B4. The most-significant-bit (MSB) of each difference output from adders 20 is the sign bit. When the sign bit is zero, the difference is positive, and mux 24 selects the output from adder 20 as the absolute difference. However, when the sign bit (MSB) is a one, the difference from subtractor 20 is negative, and the 2""s complement of the difference is selected by mux 24. The 2""s complement of each difference is generated by negation logic 22. Negation logic 22 inverts all bits in the difference and adds one to obtain the 2""s complement.
Muxes 24 thus output only positive values, since negative differences from adders 20 are negated by negation logic 22 and selected by mux 24, while positive differences are passed through mux 24 unchanged.
Final adders 26, 28, 30 perform the summing function, generating the sum of all the absolute differences output from muxes 24. Additional final adders and absolute-difference subtractors 20 can be added for comparing more pixels at a time. The final output from adder 30 represents the SAD for the four pairs of pixels compared.
Full adders or faster adders can be used for the final adders, such as described by Cash et al. in U.S. Pat. No. 4,887,233. Other kinds of absolute-difference generators can be used for each pair of pixel inputs, such as the absolute difference generator described by McManigal in U.S. Pat. No. 4,218,751.
FIG. 3 shows negation logic for generating the 2""s complement. Negation logic 22 inverts all bits of input IN using inverters 32. This is a bit-wise inversion. For example, input 010011 is inverted by inverters 32 to 101100. Adder 34 then adds 1 to the bit-inverted input from inverters 32. For example, the value 101100 from inverters 32 is incremented by adder 34 to produce output OUT of 101101, which is the 2""s complement of input IN.
FIG. 4 shows that the increment for 2""s complementation can be performed by a downstream adder using the carry-in. Adder 38 and inverters 36 perform a subtraction operation, generating A-B. Input B is inverted bit-by-bit using inverters 36, and the bit-wise inversion of B (Bxe2x80x2) is input to adder 38. Input A is applied directly to the other input of adder 38.
Adder 38 would normally generated A+Bxe2x80x2. However, the carry-in CIN to the least-significant-bit (LSB) is set to one. Setting the carry-in to 1 effectively adds one to the sum. Adder 38 thus generates A+Bxe2x80x2+1. Since the 2""s complement of B (xe2x88x92B) is Bxe2x80x2+1, adder 38 outputs A-B.
What is desired is a sum-of-absolute-difference calculator for use with a video compressor. It is desired to use a dedicated, specialized arithmetic unit to generate the sum of absolute differences (SAD) of pixel pairs in 2 macroblocks being compared for motion estimation searches. It is desired to reduce logic paths and circuit propagation delays in a SAD calculation circuit. reduction of logic stages in the SAD unit is desired.
A Sum-of-Absolute-Difference (SAD) calculator has a plurality of k pixel-pair inputs. Each pixel-pair input includes a first pixel input receiving a first pixel from a first macroblock and a second pixel input that receives a second pixel from a second macroblock. A plurality of k subtractors each generates a difference by subtracting a numeric value of the second pixel from a numeric value of the first pixel.
A plurality of k inverter banks is each coupled to the difference from one of the k subtractors, to generate bit-inverted differences. A plurality of k muxes, each receive a difference from one of the k subtractors and a bit-inverted difference from an inverter bank that receives the difference.
A sign bit for each difference is generated by each subtractor. The sign bit indicates when a difference represents positive or a negative value.
Each mux in the plurality of muxes is controlled by a sign bit for the difference input to the mux. Each mux outputs the difference when the sign bit indicates that the difference represents a positive value, but each mux outputs the bit-inverted difference when the sign bit indicates that the difference represents a negative value.
A plurality of kxe2x88x922 summing adders each has 2 multi-bit inputs and a single-bit carry input. A first level of the summing adders receive outputs from the plurality of muxes. The plurality of summing adders output a first partial sum and a second partial sum. Each of the kxe2x88x922 summing adders receives as the single-bit carry input a sign bit for a different one of the differences. The kxe2x88x922 summing adders compensate for bit-wise inversion rather than 2""s complementing of all but 2 of the differences.
A half-adder array receives the first and second partial sums from the plurality kxe2x88x922 summing adders. It generates a sum and a carry bit for each bit position by adding each bit position of the first partial sum to a same bit position of the second partial sum. The half-adder array outputs all sum bits as a sums output and all carry bits as a carries output.
A final adder has 2 multi-bit inputs and a single-bit carry input. It receives the sums output and the carries output from the half-adder array. The final adder receives at the single-bit carry input the sign bit for a difference generated by one of the k subtractors. The final adder sums the sum and carry bits output by the half-adder array with the sign bit and propagates carries to generate a SAD value. The SAD value is a sum of absolute-values of the differences generated by the k subtractors.
Thus the kxe2x88x922 summing adders compensate for bit-wise inversion rather than 2""s complementing of all but 2 of the differences, while the final adder and the half-adder array compensate for a remaining 2 of the differences.
In further aspects the half-adder array does not propagate carries among bit positions but merely outputs carries to the final adder. Thus carries are propagated by the final adder but not by the half-adder array. The half-adder array has a faster propagation delay than does the final adder. The half-adder array has a plurality of half-adder cells; each half-adder cell is unconnected to other half-adder cells. Thus half-adder cells operate independently of other half-adder cells.
In further aspects of the invention, for each pixel-pair input, the first pixel is at a first relative location within the first macroblock and the second pixel is at a second relative location within the second macroblock. Thus the first and second relative locations are a same relative location within a macroblock.