Systems for digital transmission and storage of television, still image and other video signals have been shown to perform significantly better than corresponding analog systems. The inherent advantages of the digital communication and storage techniques are primarily due to the fact that information is transmitted and stored in a binary form which is much less susceptible to noise, electronic component distortions and other distortions in conventional analog systems. In addition, the representation of the video signals in a digital form enables the use of noise reduction techniques and advanced signal processing algorithms which may be difficult or impossible to implement when operating on conventional analog signals. Digital signal representation and processing can also ensure exact repeatability of the system output signals, regardless of the electronic circuitry or transmission media. It is therefore expected that digital video systems will soon replace existing analog systems just as digital compact discs have largely replaced analog records in the audio industry.
The advantages of digital transmission techniques come, however, at the expense of a much wider required frequency bandwidth. This is particularly true in the case of high definition television ("HDTV") and modern multimedia systems where huge volumes of data have to be processed and stored, often in real time. It appears that in the future the demand for information storage and exchange will grow at an even faster pace. This demand, due to the physical limitations of the available communication channels and the electronic circuitry, presents serious technical problems such as the acquisition, processing and storage of large volumes of data in real time.
Another important problem is that of transmission of video signals over the available communications channels and the prevention of interference from adjacent in frequency sources. The presently most popular and inexpensive means for the transmission of digital data is through the public telephone network. This network was, however, designed to receive, reroute and transmit analog signals in the voice frequency range which is significantly lower than the required one for video communication. The physical limitations of the available network impose severe restrictions on the achievable signal transmission rates, which in turn make the telephone network at present not suitable for general purpose handling of digital, and specifically motion video signals.
To illustrate the complexity of the problems, consider a modern video communication and storage system, such as those used in video-conferencing, cable TV, and CD-ROM. The standard Common Interface Format (CIF) resolution standard for such systems requires a luminance channel with 352 pixels/line.times.288 pixels/frame and 8 bits/pixel, and two chrominance channels for blue (Cb) and red (Cr) with half resolution of 176 pixels/line.times.144 pixels/frame and 8 bits/pixel. Full-motion video imaging further uses a 30 frames/s picture transmission rate. This corresponds to a raw video data rate of about 36.5 Mbits/s. By means of an example, using a good quality 28,800 bit/s modem, it would take 21 minutes to transmit one second of raw video data over a telephone line. In a separate example, a CD-ROM with a capacity of 650 Mbytes and a data transfer rate of 150 kbytes/s, can only store about 2 min and 20 seconds of uncompressed video data. In addition, it takes one second to display just a single video frame, which is obviously far too slow for motion video image processing.
For practical digital video communication and storage purposes it is thus necessary to reduce the amounts of data to be transmitted and stored by eliminating redundant information. It is further desirable to design systems which maximize the amount of data processed per unit time. One of the most promising approaches in this respect is the signal compression of digitized data which can drastically reduce the information volume to be processed, without causing a noticeable degradation of the output signal quality.
Generally, any signal compression is based on the presence of superfluous information in the original signal that can be removed to reduce the amount of data to be stored or transmitted. There are two main classes of information superfluous with respect to the intended receiver. The first one is known as statistical redundancy, which is primarily associated with similarities, correlation and predictability of data. Such statistical redundancy can theoretically be removed from the data without any information being lost.
The second class of superfluous information is known as subjective redundancy, which primarily has to do with data characteristics that can be removed without a human observer noticing degradation. A typical example would be the removal of temporal details in a motion video which occur too fast to be perceived by the human eye. Unlike statistical redundancy, the removal of subjective redundancy is typically irreversible, so that the original data cannot be fully recovered.
There are well known prior art compression and coding techniques to exploit both signal redundancies. Generally, they may be classified as predictive coding, transform coding and interpolative coding. Numerous techniques may not fall into those classes, since they combine features of one technique or another. Special techniques have been developed to exploit the temporal signal redundancy in applications involving the transmission and storage of sequences of video images.
In view of the diversity of available methods, and in order to unify the research and commercial effort in this direction, various signal compression standards have been proposed in the past. There are currently three major international standards for image compression: the JPEG standard; the Px64; and MPEG standards. These three standards address different aspects of image compression.
The JPEG standard is primarily directed toward compression of still images, but has also been employed for full-motion images. The JPEG standard recommends a combination of discrete cosine transform (DCT) and Huffman-type bit compression techniques. JPEG techniques typically yield a compression ratio of about 10:1 for monochrome images and 20:1 for color images at an acceptable quality.
Px64 is the common name for a series of standards for two-way transmission of video denoted H.261 and audio denoted G.711 and G.728. The H.261 standard mandates the use of very high data compression ratios of up to 200:1. The Px64 standard is intended for videophone and video conferencing applications. Such systems are capable of very high compression ratios, but generally exhibit limited spacial resolution (about 100.times.100 pixels per frame for a videophone) and poor motion rendition (about 3-10 frames per second for a videophone). As a result, such images show very little details. In addition, as implied by the name, the Px64 standard only works with multiples of 64 kbit/s data transmission rates in high speed telephone lines, such as a switched Integrated Services Digital Network (ISDN). It is not suitable and does not perform well in a bandlimited transmission medium such as standard analog telephone lines.
The MPEG standard, which is in many ways similar to the H.261 standard, is aimed at distributing full-motion images through such media as CD-ROMs or cable TV. Typically, the compression rate of MPEG is lower than that of H.261.
Other compression schemes, such as Intel's Real Time Video (RTV) version of Digital Video Interactive (DVI) compression, only result in compression rates of about 150:1.
In contrast, turning back to the full motion video example above, one can see that transmission over a telephone line using the same high speed modem would require a data compression ratio of about 1270:1. Those skilled in the art would appreciate that it is extremely difficult to obtain such compression rates using the prior art techniques.
As clear, the technical problems associated with the video signal transmission and storage are complex and typically require a specialized technique, adapted and optimized for a particular application. A number of specialized solutions are disclosed in the patent literature. For example, U.S. Pat. No. 5,046,122 discloses a system adapted for compression of binary image data. U.S. Pat. No. 5,121,216 discloses an adaptive transform algorithm for coding of still images. U.S. Pat. Nos. 5,086,488; 5,051,840; 5,109,451; 5,126,962 disclose different orthogonal transform coding systems for image data. Other prior art systems focus on methods of image segmentation, coding of image contours, etc. Examples are discussed in U.S. Pat. Nos. 5,091,976; 5,109,438; 5,086,489; 5,058,186; 5,115,479. Other digital techniques for compressing different signals are currently under development. Examples include the use of advanced mathematical models, such as fractals (U.S. Pat. No. 5,065,447) and wavelets. In the case when a human viewer is the intended receiver, systems are proposed to use psycho-physical information to reduce the perceptual effects of errors due to information compression.
There is currently a consensus that no single compression technique is likely to succeed in all different applications. The reason for this is that the performance of digital compression and coding systems is highly scene-dependent. Thus, an efficient coding system should be able to adapt to application specific conditions and apply different techniques to different portions of the image or image sequence in order to maximize the achievable compression ratio for a minimized perceivable effect to a human viewer.
Several adaptive systems have been proposed in the past to apply different techniques to different portions of the image and thus optimally exploit the advantages of specialized processing. An example is disclosed in U.S. Pat. No. 5,121,216 where a still input image is split into blocks, each of which is orthogonally transformed. Block attributes are next computed to determine threshold information and quantization parameters, so that "busy" image blocks are separated from relatively "quiet" image blocks to use differences in the perceptual error in each case to reduce the total amount of transmitted data. The use of this technique is however limited to storage and transmission of still images.
U.S. Pat. No. 5,091,782 discloses an apparatus and method for adaptively compressing successive blocks of digital video. The data compression method is optimized by using different compression algorithms in different data formats on a block-by-block basis. The method is especially suitable for use in interlaced television signal transmission, where the system specific separation of the video frames into even and odd fields allows the apparatus to process in parallel individual data blocks presented in a frame and field formats. The compression error in each format is evaluated and the one resulting in a smaller error is selected to represent the compressed image information. While enabling the use of paralellized processing to speed up the signal compression, the proposed method is deficient in that nearly half of the computed data is thrown out for each processed block. In addition, the method does not permit the incremental increase of image quality in processing areas where the error is larger than system specifications.
It would thus be advantageous to provide a compression system that combines the advantages of specialized processing for different types of images and image sequences by avoiding the disadvantages of prior art techniques. It would further be advantageous to provide such a system that permits video signals in different input formats to be compressed and then reconstructed with little degradation in the motion rendition. It would also be desirable to provide a general purpose compression system for optimizing the compression of digital data by combining different techniques to obtain peak performance under different possible conditions.