The invention relates generally to a method of and apparatus for coding and compressing information. More specifically, the invention relates to a method for coding and compressing digital signal data, such as digitized video images, for storage and transmission. While the following discussion refers to compression of digitized video images, it is to be understood that the invention is not restricted to video data, and may produce benefits in other fields where compression is used.
As the digital revolution has spread to many industries and technologies, the need to transmit, store and manipulate large amounts of data has grown. Consequently, there have been many attempts to code and/or compress data in order to reduce the necessary storage capacity, transmission rate, and processing speed to perform a given task.
For example, one of the fields that has recently developed a need for intensive digital data handling is video technology. This field includes applications such as the transmission of real time video over telephone lines for teleconferencing and the use of digital video-transmission for home television viewing as well as interactive television and computer applications. Presently, there are limitations imposed by the quality and available bandwidth of existing transmission lines as well as capacity of storage devices necessary to store and transmit image data.
To reduce the data capacity requirements of such video systems and/or improve performance of systems with limited data capacity, various methods have been devised. So-called "lossless" compression methods rely on redundancies within the data, for instance by assigning a single code to represent an entire block of data that may repeat several times. Other methods are considered "lossy," because some of the data is lost in the compression/decompression process resulting in images that differ from the originals. Both methods are useful and practical and are commonly used together to create an effective compression system.
In the compression of video images, known lossy techniques produce undesirable effects in the final decompressed image, such as pixellation and posterization. These undesirable effects are known as `artifacts`. Pixellation occurs when the number of stored pixels is reduced for the total image, i.e., the resolution is decreased, leading to jagged lines and squared-off curves. Posterization occurs when the number of values representing pixel brightness and color is reduced. For example, typical digitized monochrome video images normally have 256 shades of gray. If that number is reduced to 16 or 32 for the same image, areas with smooth shading gradations now have regions of uniform shade, and transitions from one shade to the next are obvious. Other lossy techniques, such as using a low-pass filter to eliminate high-frequency noise, also eliminate the high-frequency portion of the image, making it muddy and lacking in detail.
To better preserve the quality of the image, others have applied lossless compression methods. However, when the process necessarily includes analog components (cameras, tape equipment, etc.), these methods are overkill. Any analog component, even the analog-to-digital (A/D) converters used to digitize an image, will add a certain amount of noise to the digital data captured, and this amount of noise varies from system to system. Lossless compression methods compress, transmit and decompress this noise accurately, along with the "good" data, at great expense in storage and processing requirements. Some complex and noisy images can even be larger in their "compressed" form than when not compressed with some methods. Unfortunately, there is no obvious way to determine what is noise and what is data once the digital signal is produced; to the system, it is all data. It is possible, although complicated, to use various forms of "dithering" with filtered data to add a somewhat random or pseudo-random aspect to the reproduced image, hopefully ameliorating some of the effects of filtering and posterization. Often, these methods involve the addition of a separate machine generated random signal during compression and/or decompression. Of course, this requires many additional components and method steps, driving up the costs and slowing down maximum processing speeds.
One relatively simple compression scheme well known in the art is called "delta encoding". Although primarily used in compression of digitized audio signals, several attempts have been made to apply the principle to the compression of image data. In delta encoding, a series of data values are encoded as a first value and a series of differences, or deltas, between each value and the next subsequent value. Delta encoding holds advantages when used in conjunction with lossless compression means such as Huffman coding, known in the art, which take advantage of the statistical frequency of values in a series to achieve data compression. The advantage of delta encoding arises from the fact that the frequency distribution of the differences between subsequent values in a series is often much less uniform than the distribution of the actual values, and in many cases this provides substantial gain in the compression of such data.
So far, what has been described with delta encoding is a lossless compression method, in that the decoded values will be identical to the encoded values. Limiting the allowable delta values which can be encoded to a subset of the possible delta values comprises a `lossy` compression process known as delta quantization. The quantizing function must include a means of choosing one of the allowable deltas if the actual delta is not equal to any of them; a simple and effective rule is to choose the allowable delta closest to the actual delta. It should be noted that when using delta quantization, the error, or difference between the input pixel value and the corresponding decoded value, must be added into the next delta prior to quantization; this error is therefore being incorporated into subsequently processed pixel values, which is to say the error is being distributed in the direction corresponding to pixel processing. An equivalent way of achieving this is to produce each delta not by taking the difference between subsequent pixel values in the original image, but by taking the difference between the present pixel value and the last decoded pixel value, which is often maintained anyway by the compression apparatus for reference display.
Drawbacks of delta quantization particularly as applied to images are: reduced spatial frequency response, posterization, and edge artifacts. Frequency response is dependent on the size of the largest allowable delta values; posterization results from the choice of the smallest values; and edge artifacts result from both of these factors as well as from the fact that previous delta encoding schemes distribute error in a single spatial dimension.
Furthermore, the existence of substantial noise elements introduced by the analog components of the video process (including grain from film sources) reduces the ability of delta encoding to produce an image of acceptable quality along with substantial data compression. The reason for this is that noise manifests itself primarily in the high-frequency domain, which creates a wide distribution of small differentials between subsequent data values. The designer of a digital video compression system using delta quantization techniques is faced with a dilemma: either allow a large number of small delta values to be used, resulting in less compression, or accept noticeable edge and posterization artifacts. For this reason, delta encoding for video compression has been limited to use in situations where the level of compression needed is small, or where image quality is not of paramount importance (certain computer applications).
Typically, where high levels of compression must be achieved, along with superior video quality, more complicated means have been pursued, including discrete cosine transform, vector quantization, wavelet, and other techniques known in the art. In addition, some compression systems as described above which aim at high levels of compression use inter-frame and motion estimation techniques, which further increase complexity and cost.
A brief summary of the features of such complicated means is set forth:
Vector Quantization (VQ). A highly time asymmetrical algorithm often used on PCs. Results in poor quality video, although hardware to play it back can be produced cheaply. Several proponents of VQ have given up on this technology.
Fractal Compression The picture quality is poor, it is even more time asymmetrical than VQ, and the complexity of decompression makes this likely to be too slow to run video.
Wavelets This technology is capable of good quality video and is fairly time symmetrical, although it is very complex on the decompression side. Several problems with this technology forced at least one company developing it to drop it.
Discrete Cosine Transform This is the core of the JPEG still image standard and the elusive MPEG standard. Although very complicated, this algorithm is time symmetrical and results in reasonable quality video. Artifacts inherent in this scheme, however, are usually obvious and annoying.
MPEG. This is a version of DCT with bidirectional motion estimation. It is very complicated, highly asymmetrical and will have digital artifacts.
VQ is undesirable on the quality basis. The rest of the above technologies are simply too expensive. A reasonable implementation of the MPEG standard will require 3 million gates. A chip of this complexity is equivalent to the Intel Pentium or the DEC Alpha. Given typical yields, a 3 million gate chip is very expensive to design and manufacture.
The problems associated with noise are not solved by these systems; they merely manifest themselves in other ways than with delta encoding schemes. Complex filtering processes and quantization of frequency components can increase compression levels by reducing noise, but often create artificial-looking results and introduce artifacts such as texture baldness. Inter-frame and motion estimation techniques exacerbate the problem, causing artifacts such as blockiness and frozen noise.