1. Field of the Invention
This invention relates to adaptive coding and to the real-time compression of digital signals suitable for the transmission through a communications channel or for recording and playback on a magnetic tape recorder or other recording medium. More particularly, the present invention relates to a feedforward technique for estimating a variable quantization parameter so that when recorded, the amount of compressed data can be stored in the space allotted on the recording medium.
2. Description of Prior Art
In general, the goal of data compression is to send digital information from one point to another through a transmission channel using the least amount of information transfer as is possible. In other words, the object is to eliminate the transmission of unnecessary information. Video images, by their very nature, contain a great deal of redundancy and thus are good candidates for data compression. A straight-forward digital representation of an image necessarily contains much of the same redundancy both in a spatial sense and a temporal sense. By removing a portion of the redundancy from the image data at the transmitter, the amount of data transmitted over a communications channel or recorded on a storage medium may be substantially reduced. The image may then be reconstructed at the receiver or, if recorded, in the recorder playback electronics by reintroducing the redundancy. (The expression "image data" as used herein refers to data defining an image to be displayed.)
From a very general perspective, there are two classes of data compression: lossless compression and lossy compression. Lossless compression, as the name implies, allows the original data to be exactly reconstructed after being compressed without any loss of information. Lossy data compression is an irreversible process which introduces some amount of distortion into the compressed data so that the original data cannot be exactly reproduced. In order to obtain large compression factors for images, it is necessary to use lossy compression methods of the type described herein. Lossy compression may be an acceptable alternative as long as the amount and type of distortion produced in the reconstructed image are not objectionable. For example, in the professional video industry where the S/N is typically 57 dB or better, the compressed image must be virtually indistinguishable from the original, i.e., any more than 2 or 3 dB of signal impairment is objectionable since it is noticeable to viewers of a video display.
Image compression for use in conjunction with digital video tape re-corders has several unique requirements which impose additional constraints on any compression method used. The unusual constraints arise from the typical modes of use of a video tape recorder, and from the fact that the data must be stored for later use rather than immediately transmitted. For example, a tape recorder must allow editing of the recorded information. Practically, this means that the stored data for one field occupy an integer number of tracks on the tape or occupy defined blocks of video data, such as a television field, at predictable locations or tracks on the tape. This imposes the constraint that a field of data, or a transmitted and/or recorded data block is constant in length. Such a seemingly simple constraint places a severe design requirement on any compression scheme. Because most images statistically have nonuniform probability density functions, the obvious solution to a digital signal having varying information content would be to allow the encoded data rate to vary on a frame-by-frame or field-by-field temporal basis according to the image content. But because of editing requirements, the encoded data rate must be fixed rather than variable.
Video tape recorders for television broadcast applications must also allow pictures to be reproduced at other than normal record/playback tape transport speeds. At exceedingly higher playback speeds associated with the picture in shuttle mode, only a fraction of the data on each track is recovered. This requires that the compressed recorded data be stored in small complete data blocks from which a portion of the picture may be recovered. Also an editing feature in a recorder places additional restraints on a compression method. In the edit mode, recorded information is replaced by new information, which requires that the smallest unit of information to be replaced (in television signals this is a single field) be allotted a fixed space in the recorded data format. This allows any unit of a video signal to be replaced with any equally sized unit of a video signal. To maintain maximum efficiency in recording, and to minimize gaps for record over-runs, it is best to use a record format which has a fixed short period related to the original uncompressed information. This simplifies the design of the data deformatter by providing a regular and expected structure to the data stream recovered from tape. This regular structure allows "intelligent" deformatting of the data because certain patterns may be identified as errors and ignored.
Heretofore, various digital video compression studies have focussed on the two-dimensional discrete cosine transform (the DCT) for use as the preferred adaptive coding vehicle, due to its energy compaction properties and relative ease of implementation with digital circuits. (See "Discrete Cosine Transform," IEEE Transaction on Computers, vol. C-23, Pg. 90-93, Jan. 1974.) To perform a transformation on a video image, the image is first divided into blocks of pixels (e.g. 16.times.16 or 8.times.8), and then cosine transformed into a set of transform coefficients, each of which represents a scalar weighting parameter (i.e., a coefficient) for a two-dimensional cosine transform function. In the cosine transform domain, the amplitude coefficients are concentrated at the lower frequency terms with many of the upper frequencies being zero valued. If the coefficients are coarsely quantized into integral values and then Huffman coded, the number of bits needed to represent the image are greatly reduced. A key factor in making this scheme work effectively is the quantizing process. If the quantization is too fine, the data generated by the Huffman coder will exceed the data rate of the channel (or recorder), while too coarse a quantization results in unacceptable distortion/noise. One technique for determining a suitable quantization parameter for the required data rate simply monitors an output buffer memory and, using a feedback scheme, adjusts the quantization level to maintain an equilibrium of data in the buffer. This method is described in the article, "Scene Adaptive Coder" by Chen et al., appearing in IEEE Transactions on Communications, Vol. Com. 32, No. 3 (March 1984). It is also described in U.S. Pat. No. 4,302,775. However, in recording processes methods utilizing buffer fullness do not lend themselves to accurate rate control over small amounts of information, and thus do not enable efficient and accurate editing and picture in shuttle. Bit allocation methods as utilized in the past do not produce the quality of images that are desired if a relatively wide range of different images defined by the data are to be reduced.
In some instances, such as the one described immediately above, a threshold level is applied to the transformed data coefficients. That is, all values below a certain threshold are considered to be zero. This thresholding also is often considered to be quantization, and as used herein the terminology applying a "quantization" or quantizing parameter is meant to include applying a threshold level value, a scaling factor or other numerical processing parameter.
It is generally desirable to vary the quantizing parameters to produce the smallest increase in visible distortion of a compressed video image while still providing a desired output data rate. The parameter which may be changed to best advantage changes as the data rate changes, which is a function of the information content of the image. Different sources of data and to a lesser degree different images are optimally quantized by different strategies since the information content thereof changes. The distortion problem is particularly acute in many television applications in which reprocessed image quality is important. It is also necessary for most of such applications that multiple generations of compression i.e., multiple compression/expansion cycles are made without noticeable increase in degradation with successive generations.
It is a general object of this invention to provide a data-compression scheme operable in real-time and suitable for recording and playback on a magnetic tape recorder or other recording medium.
It is another object of this invention to provide an apparatus and method of adaptive image coding using a feedforward technique for estimating a variable quantization parameter.
It is a further object of this invention to provide an iterative process for estimating a quantization parameter Q for scaling encoded compressed data so that the original input data can be stored in a smaller fixed space on the recording medium (magnetically, optically or electrically) than would otherwise be possible without the data compression process.