Communications systems typically transmit and receive data at predetermined data rates. Techniques that decrease the data rate are highly valuable. Data compression methods for improving the efficiency of video data transmission (or storage) build on both redundancies in the data and the nonlinearities of human vision. They exploit correlation in space of still images and in both space and time for video signals. Compression in space is known as intra-frame compression, while compression in time is called inter-frame compression. Methods that achieve high compression ratios (10:1 to 50:1 for images and 50:1 to 200:1 for video) typically are lossy in that the reconstructed image is not identical to the original. Lossless methods do exist, but their compression ratios are far lower, typically no better than 3:1.
The lossy algorithms also generally exploit aspects of the human visual system. For example, the eye is much more receptive to fine detail in the luminance (or brightness) signal than in the chrominance (or color) signals. Consequently, the luminance signal is usually sampled at a higher spatial resolution. (For example, in broadcast quality television, the digital sampling matrix of the luminance signal might be 720 by 480 pixels, while for the color signals it may be only 180 by 240 pixels.) In addition, the eye is less sensitive to energy with high spatial frequency than with low spatial frequency. Indeed, if an image on a 13-inch personal computer monitor were formed by an alternating spatial signal of black and white, the viewer would see a uniform gray instead of the alternating checkerboard pattern.
Three digital video standards that have been proposed are the Joint Photographic Experts Group (JPEG) standard for still picture compression; the Consultative Committee on International Telephony and Telegraphy (CCITT) Recommendation H.261 for video teleconferencing; and the Moving Pictures Experts Group (MPEG) for full-motion compression on digital storage media (DSM).
JPEG's proposed standard is a still picture-coding algorithm developed by a research team under the auspices of the International Standards Organization (ISO). The scope of the algorithm is broad: it comprises a baseline lossy approach and an extended lossless approach, as well as independent functions using coding techniques different from the baseline approach.
FIG. 1A depicts the baseline JPEG algorithm. The baseline algorithm for the compression of still images included in the JPEG proposed standard divides the image into 8-by-8 pixel blocks, represented in the figure by a 4-by-4 block for simplicity. In the encoder, the image is first digitized, then undergoes a discrete cosine transform (DCT) that yields 16 frequency coefficients. The two-dimensional array is read in a zigzag fashion to reorder it into a linear array. The coefficients obtained by quantization (dividing by 10) are then coded using the Huffman table (variable length coder).
The decoding path takes the variable-length coding (VLC) output and recovers the quantized coefficients, and turns the linear array into a 2-D array through an inverse zigzag operation.
FIG. 1B depicts the CCITT algorithm. The algorithm operates on a difference signal generated by an inter-frame predictive coder. Like the JPEG algorithm, each 8-by-8-pixel block of the frame is encoded with the DCT and then quantized, as indicated by the block labelled Q. There are two signal paths at the output of the quantization block Q: one leads toward a receiver through a lossless coder and optional error-correction circuitry; the other, a feedback, is inverse quantized and undergoes inverse DCT to yield a reconstructed block for storage in frame memory. Reconstruction is needed because interframe compression uses predictive coding, which requires the encoder to track the behavior of the decoder to prevent the decoder's reconstructed image from diverging from the original input. When the entire frame has been processed, a reconstructed image as seen by the decoder is stored in the frame memory block. Next, inter-frame coding is applied. To compensate for motion, each 8-by-8 block in the current frame is matched with a search window in the frame memory. Then a motion vector that represents the offset between the current block and a block in the prior reconstructed image that forms the best match is coded and sent to the receiver. The predictor provides the motion-compensated 8-by-8 block from the reconstructed frame. The difference between this and the original block is transform coded, quantized and coded before being sent to the receiver.
The CCITT decoder, shown at the bottom of FIG. 1B, first corrects incoming bit stream errors, and then decodes the data in the variable-length decoder. Inverse quantization and inverse DCT yield the DCT coefficients. In the decoder's frame memory a block like one in the encoder's feedback loop has been reconstructed and stored. In inter-frame mode, motion vectors extracted from the variable-length decoder are used to provide the location of the predicted blocks.
The foregoing compression techniques may be directly applied to stationary images that have been sampled using a rectangular grid of samples of the type depicted in FIG. 2. However, in the case of conventional television signals, interlaced scanning is applied such that individual fields do not contain a complete representation of the image. In a 525-line television picture (wherein each frame consists of two fields), half of the scan lines are displayed in even-numbered fields and the remainder are displayed in odd-numbered fields, as shown in FIGS. 3A and 3B. The human eye and brain partially integrate successive fields and thereby perceive all of the active lines.
One effect of interlaced scanning is to reduce the amount of spatial correlation within a local region of the image. For example, if an n-by-n pixel segmentation is applied to one field, it will span 2n lines of the frame and will consist only of alternate lines. Similarly, if the n-by-n pixel segmentation is applied to a span of n frame lines (n/2 from each field), then spatial correlation will be decreased in moving areas of the image due to the 1/60 second interval between fields. In this case, a horizontally moving object in the image will appear blurred, or as an "artifact." This phenomenon is illustrated in a simplified way in FIGS. 4A and 4B, where FIG. 4A depicts a static image and FIG. 4B depicts a scene with horizontal motion. The areas of movement will have low spatial correlation and thus cannot be described by low frequency terms of a DCT. The same difficulty arises in the case of vector quantization and other compression techniques. Accordingly, the object of the present invention is to provide methods and apparatus for increasing the correlation in data representing moving areas of a television or video picture so that the data can be compressed without a loss in picture quality.