1. Field of the Invention
The invention concerns the digital representation of images generally and more specifically concerns the techniques used to compress digital representations of images before transferring them via a medium with limited bandwidth.
2. Description of Related Art
Digital images are originally represented in the memory of a computer system as arrays of picture elements or pixels. Each pixel represents a single point in the image. The pixel itself is an item of data, and the contents of the item of data determine how the point represented by the pixel will appear in the digital image. The quality of a digital image of course depends on the number of pixels in the image and the size of the pixels; generally, the more pixels there are in the image and the larger the item of data representing each pixel is, the better the image.
Because this is the case, the arrays of pixels used to originally represent high-quality digital images are very large and require large amounts of memory. The size of the arrays is particularly troublesome when the digital images in question are part of a sequence of images that when seen in the proper order and with the proper timing make a moving image. The apparatus that is displaying the sequence of images must be able not only to store them but also to read and display them quickly enough so that the timing requirements for the moving images are met.
The problems of timing and storage are particularly severe where the sequence of digital images is distributed by means of a medium with limited bandwidth to a receiver with limited storage. Examples where this is the case are digital television, videoconferencing, or videotelephony. In these applications, the sequence of images must be transmitted by means of a broadcast or cable television channel or a telephone line to a relatively low-cost consumer device such as a television set, video telephone, or personal computer with limited amounts of memory to store the images. These applications are consequently economically practical only if some way is found to compress the digital images and thereby to reduce the bandwidth required to transmit the images and/or the storage required to store them at their destinations.
The art has developed many different techniques for compressing sequences of digital images. One example of these techniques is the MPEG-2 standard for compressing digital video, described in Background Information on MPEG-1 and MPEG-2 Television Compression, which could be found in November 1996 at the URL http://www.cdrevolution.com/text/mpeginfo.htm. All of these techniques take advantage of the fact that a sequence of digital images contains a great deal of redundant information. One type of redundancy is spatial: in any image, there is liable to be a high degree of similarity among pixels in a given small area of the image. Since that is the case, it is often possible to describe an area in an image by means of a pattern consisting of some small number of pixels and a description of the shape of the area that contains the pattern. Further, where a given area of the image strongly resembles another area of the image but is not identical to the other area, it is possible to replace the pixels in the given area with a representation that describes the given area in terms of the difference between it and the given area.
The other type of redundancy in a sequence of images is temporal; very often, a given image in the sequence is very similar in appearance to an earlier or later image in the sequence; it is consequently possible to compress the given image by making a representation of the given image that represents the difference between the given image and the earlier or later image, termed herein the reference image, and using this representation in place of the representation as an array of pixels.
One way of expressing the difference between the given image and the reference image is shown in FIG. 1. Digital given image 101 is represented in memory as an array of pixels 105. The image is further divided into blocks 103, each of which is typically 16 pixels square. An object 107 in given image 101 is contained in four adjacent blocks 103: blocks 103(m,n), (m+1,n), (m,n+1), and (m+1,n+1). In given image 109, object 107 is in a different position, namely blocks 103(b,s), (b+1,s), (b,s+1), and (b+1,s+1), but object 107 otherwise has substantially the same appearance as in given image 101. Since that is the case, object 107 can be described in the compressed representation of given image 101 in terms of its differences from object 107 in reference image 109. There are two kinds of differences:
the change of location of object 107 in given image 101, and PA1 any change of appearance of object 107 in given image 101.
The first kind of difference can be described in terms of an offset of object 107 in given image 101 from its position in reference image 109. The second kind can be described in terms of the difference between the appearance of object 107 in given image 101 and the appearance of object 107 in reference image 109.
The use of compression techniques such as the ones just described permit the creation of compressed representations of sequences of digital images which are small enough to satisfy the bandwidth and memory constraints typical of commercial digital television, digital teleconferencing, and digital videotelephony. The production of a compressed representation of a digital image from a pixel representation of the digital image is termed herein encoding the image. The image presently being encoded is termed in the following the current image. Image encoding requires large amounts of computation. The reason for this is that both the compression techniques described above require that blocks 103 of pixels be compared with each other. In the latter technique in particular, it is necessary to locate blocks of the reference image that are similar to blocks of the current image, and a search for such a block may potentially involve comparing all of the blocks of the reference image with a given block of the current image. Depending on the application, the similarity of the blocks being compared is measured by either of two formulas, the Sum of Pixel Absolute Errors (SAE) and the Sum of the Squared Pixel Errors (SSE). It is noted that the words Error and Differences are used with equivalent meaning within the same context by those skilled in the art. Hence, SAD and SAE refer to the same block matching computation. Likewise, SSD and SSE are equivalent. Where what are being compared are 16-pixel square blocks 103, SSE is defined for each (u, v) offset of the position of the block in the reference image from the block in the current image: ##EQU1## where P.sub.curr is the block being predicted using motion estimation in the current picture and P.sub.ref is a candidate block in the search space in the reference picture displaced from P.sub.curr by the vector (u, v).
For a comparison of 16 by 16 pixel blocks, SAE is defined for each (u, v) offset in the search space as: ##EQU2##
A comparison of two 16-pixel blocks using SSE would require 256 subtractions, 256 squaring operations (i.e., multiplies), and 255 additions for each considered candidate block predictor. SAE replaces the multiply operation with an absolute value operation.
The process of searching for blocks in the reference image that are similar to blocks in the current image is termed herein motion estimation, and as will be immediately apparent from the foregoing, motion estimation requires enormous numbers of block comparisons.
Three different classes of methods are known for reducing the number of comparisons. Two of them are directed to reducing the number of blocks that must be compared; the third is directed to reducing the number of pixels within the block that must be compared.
1. Methods that reduce the number of candidate blocks in the search space that are considered as predictors by using heuristics. Examples of such methods are the Logarithmic Search and the Three-step Search methods, explained in K. R. Rao and J. J. Hwang, Techniques and Standards for Image, Video, and Audio Coding, Prentice-Hall Press 1996. PA0 2. Hierarchical search methods that simultaneously reduce the number of pixels in the computation of the block matching criterion and the number of considered block predictors, are also explained in the Rao reference supra. These methods generate successive lower resolution of the current and reference pictures by decimating or low-pass filtering by a factor of two in both the horizontal and vertical. Block matching is performed at the smallest available resolution and the results are mapped to the next higher resolution where a limited number of block candidates are considered at displacements localized around the mapped best match. PA0 3. A method that reduces the number of operations required to compute the block matching criteria by comparing only one quarter of the pixels in the block. The method divides the 16.times.16 block into 64 2.times.2 sub blocks and compares only a single pixel in every 2.times.2 sub-block. The method alternates the pixel being compared between the northwest, northeast, southwest, and southeast pixel in the sub-block at different block predictor offsets. The method is described in detail in B. Liu and A. Zaccarin, "New Fast Algorithms for the Estimation of Block Motion Vectors", IEE Transactions on Circuits and Systems for Video Technology, vol. 3, no. 2, 1993.
The first two methods only address the problem of reducing the number of blocks to be compared, not the problem of reducing the computational cost of the block comparison itself. The third method in fact does reduce the number of comparisons made, but has disadvantages. First, because it does not compare adjacent pixels, it does not take advantage of the parallel processing capabilities of many modern microprocessors. Second, the choice of pixels to be compared does not take into account the fact that the human eye is far more sensitive to vertically-aligned detail than it is to detail with other orientations. Because this is so, particular attention must be paid to vertically-aligned detail in block comparison. It is an object of the present invention to provide improved techniques for comparing blocks of pixels which reduce the number of comparisons made in a block and which at the same time preserve vertical detail, are well adapted to use in the processor of a computer system, and can take advantage of whatever parallel processing capabilities the processor may have. Since the invention is directed to comparing blocks efficiently rather than to selecting the blocks to be compared, the technique can be employed to do block comparison in either of the first two methods described above.