Images are conventionally represented by a two-dimensional array of values in which each value represents a property of the image at a corresponding point on the image. In the case of gray-scale images, a single number representing the gradations of intensity from white to black, referred to as the gray scale, is stored. In the case of color images, each "value" is a vector whose components represent the gradations in intensity of the various primary colors, or some alternative color code, at the corresponding point in the image. A motion picture comprises a sequence of such images. Typically, 24 or more images are needed for each second of viewing time.
This representation of a motion picture corresponds to the output of a typical image-sensing device such as a television camera. Such a representation is convenient in that it is easily regenerated on a display device such as a CRT tube. However, the number of bits needed to represent the data is prohibitively large for many applications. A single 512.times.512 gray-scale image with 256 gray levels requires in excess of 256,000 bytes. At 30 frames per second, a communication channel with a bandwidth of approximately 64 million bits per second is needed to transmit the motion picture. A full color 24 bit per pixel motion picture would require a bandwidth of more than 190 million bits per second.
This bandwidth is significantly greater than that available for many communications purposes. In addition, the storage required for a two hour digitally recorded movie exceeds the storage capacity of the available CDs. Hence, some form of image compression system must be utilized to store and transmit high quality video signals.
Image compression systems used in motion picture compression applications make use of the redundancy within frames and between frames to reduce the amount of information needed to represent the video sequence. For example, many scenes in a motion picture include portions that are constant from frame to frame for several seconds, if not minutes. The constant portion need only be sent once. Hence, if the constant portion comprises a significant fraction of the scene, considerable image compression can be realized. For example, if the entire scene were constant for 10 seconds, the information needed to reproduce the scene would be contained in one frame of the sequence and an indication of the number of times the frame is to be repeated. This would be approximately 1/300 of the information needed if the scene were sent without compression.
One method for representing a sequence of images is to utilize an image as a reference frame. Successive frames in the sequence are then represented by a two step process. First, the current frame is divided into a series of blocks. The frame is approximated by finding blocks in the reference frame that match the blocks in the current frame. The blocks in the reference frame may be displaced from the blocks in the current frame. This approximation may be viewed as a set of instructions in the form "reproduce the block starting at coordinates (n,m) in the current frame by the block at coordinates (N,M) in the reference frame". The sequence of instructions provides an approximation of the second frame. This approximation is then subtracted from the current frame to form a residual frame. Ideally, the residual frame has substantially less information than the current frame. The residual frame is then further compressed using one of the still image compression algorithms such as the discrete cosine transform (DCT) or subband coding algorithms.
Many models have been devised to represent the apparent motion in a video sequence. However, the constraints imposed by the computational costs associated with performing motion estimation limit commercial video coder systems to models that only track transnational motion. In such block-matching algorithms (BMAs), it is assumed that every pixel in a block has the same motion relative to the block in the reference frame, and that each block in the current frame is predicted from blocks in the reference frame that have undergone some type of motion.
Even with these constraints, BMA systems impose significant computational loads on the compression system. Consider a system in which an N.times.N frame is divided into nxn blocks. Each block must be compared to all possible blocks in the reference frame from which it could have been derived by the motion of an object in the reference frame. The simplest matching algorithm computes the sum of the absolute difference of the pixel values between a candidate block in the reference frame of the block in the current frame. Hence, a minimum of n.sup.2 subtractions are required per candidate block. If the region in the reference frame over which the search is performed is M.times.M pixels, then the computational workload is of order M.sup.2 n.sup.2 per block in the current frame. Hence, it is advantageous to minimize the search area, i.e., reduce M.
If the search area is set too small, the best match will not always be found, since the corresponding block may be outside the search area. In the absence of noise, this case can be detected by noting that the best fit lies on the boundary of the search area in some cases. However, in the presence of noise, the function being optimized will have local minima generated by the noise. These local minima can be mistaken for a match. When this occurs, the approximation created by the BMA is poor, and the degree of compression obtainable is significantly reduced. If the compress algorithm maintains a minimum compression ratio, the quality of the reconstructed image may also be reduced.
Accordingly, in prior art systems, the search area is set to encompass the maximum area that a block could have moved from one frame to the next given typical speeds with which objects move in the physical world to avoid missing the matching block in the reference frame. This leads to an increased computational load.
Broadly, it is the object of the present invention to provide an improved image compression system for motion picture sequences.
It is a further object of the present invention to provide an improved BMA.
It is a further object of the present invention to provide a BMA that is robust in the presence of noise than prior art BMAs.
It is a still further object of the present invention to provide a BMA having a smaller search area than prior art BMAs.
These and other objects of the present invention will become apparent to those skilled in the art from the following detailed description of the invention and the accompanying drawings.