This invention relates to real-time compression and encoding of digital video signals to provide for the transmission of compressed digital data through a communications channel, or for recording and playback of compressed data on a magnetic tape recorder or other recording medium. More particularly, the present invention relates to a technique for combining the encoding of, for example, the luminance and chrominance components of a common portion of an image, such that in a portion with less chrominance information content more luminance information is conveyed, and vice versa.
In general, the goal of data compression is to send digital information from one point to another through a transmission channel using the least amount of information transfer as is possible. In other words, the object is to eliminate the transmission of unnecessary information. Video images, by their very nature, contain a great deal of redundancy and thus are good candidates for data compression. A straight-forward digital representation of an image necessarily contains much of the same redundancy both in a spatial sense and a temporal sense. By removing a portion of the redundancy from the image data at the transmitter, the amount of data transmitted over a communications channel or recorded on a storage medium may be substantially reduced. The image, then may be reconstructed by reintroducing the redundancy at the receiver or, if recorded, in the recorder playback electronics. It is to be understood that the expression "image data" as used herein refers to data defining an image to be displayed in two dimensions, which further may take the form of a time varying image composed of multiple video frames which are equally spaced in time. Such a signal might be a moving scene derived from a video camera.
Image compression for use in conjunction with digital video tape recorders has several unique requirements which impose additional constraints on any compression method used. The additional constraints arise from the typical modes of use of a video tape recorder, and from the fact that the data must be stored for later use rather than immediately transmitted. For example, a tape recorder must allow editing of the recorded information. In a recording system where data is formatted in fixed length sync blocks, this means that the stored data for one field should occupy an integer number of tracks on the tape or occupy defined blocks of video data, such as a television field, at predictable locations or tracks on the tape. In a system where data is formatted in interleaved sync blocks, the editing requirement imposes the less stringent constraint that a field of video data fit within a space on tape corresponding to one field of video. Such seemingly simple constraints place a severe design requirement on any compression scheme. Because most images statistically are non-stationary (that is, the statistical distribution, or complexity, varies as a function of position within the image), the obvious solution to compressing a digital signal having varying information content would be to allow the encoded data rate to vary on a frame-by-frame or field-by-field temporal basis according to the image content. But because of editing requirements, the encoded data rate should be fixed rather than variable. Thus, in the edit mode, the replacement of recorded information by new information requires that the smallest unit of information to be replaced, such as a single field in a television signal, be allotted a fixed data block length in the recorded data format. This allows any unit of a video signal to be replaced with any equally sized unit of the video signal.
Video tape recorders for television broadcast applications must also allow pictures to be reproduced at higher than normal record/playback tape transport speeds (picture-in-shuttle). At the exceedingly higher playback speeds associated with the picture-in-shuttle mode, only a fraction of the data on each track is recovered. This requires that the compressed recorded data be stored in small complete data segments substantially smaller than one track in length, whereby a most significant portion of the picture may be recovered and individually decoded even at the higher speed.
Heretofore, various digital video compression studies have focussed on the two-dimensional discrete cosine transform (DCT) for use as the preferred adaptive coding vehicle, due to its superior performance in producing compressed images with low distortion over a wide range of images. (See "Discrete Cosine Transform," IEEE Transaction on Computers, vol. C-23, Pgs. 90-93, January 1974.) To perform a transformation on a video image, the image first is divided into blocks of contiguous pixels (e.g. 16.times.16 or 8.times.8), and then each block is cosine transformed into a set of transform coefficients, each of which represents a scalar weighting parameter (i.e., a coefficient) for a two-dimensional cosine transform function. In the cosine transform domain, the amplitude coefficients of non-zero value are concentrated at the lower frequency terms, with many of the upper frequencies being zero valued. Due to the nature of the transform and the existence of correlation in the original image, there is generally in the transformed image a large share of relatively small valued coefficients and a decreasing occurrence of large amplitude coefficients. Thus, if the coefficients are coarsely quantized into integral values and then Huffman coded, the number of bits needed to represent the image are greatly reduced.
More particularly, the quantizing factor is applied to the amplitude coefficients as follows. Each amplitude coefficient is scaled by the quantizing factor and rounded to the nearest integer. The integers obtained after the scaling and rounding are encoded using any of a number of entropy coding techniques, such as Huffman coding. Since the distribution of coefficient amplitudes has a high probability of being a small value, (a property of the transform as applied to images), short length code words are assigned to the smaller amplitudes to achieve the shortest overall message length. It can be seen that as the quantizing factor is increased, the resulting message length will decrease monotonically. Hence, an increase in the quantizing factor causes an increase in the compression. The errors due to quantizing also increase with increasing quantizing factor, leading to increased distortion in the decoded image. Thus, if the quantization is too fine, the data generated by the Huffman coder will exceed the data rate of the channel (or recorder), while too coarse a quantization results in unacceptable distortion/noise.
In some instances, such as one wherein the required data rate is obtained by simply controlling the fullness of an output buffer memory and using a feedback scheme to adjust the quantizing value to maintain an equilibrium of data in the buffer, a threshold level is applied to the transformed data coefficients. That is, all values below a certain threshold are considered to be zero. This thresholding also is often considered to be quantization, and as used herein the terminology applying a "quantization" or quantizing factor is meant to include applying a threshold level value, a scaling factor or other numerical processing parameter.
It generally is desirable to vary the quantizing parameters to produce the smallest increase in visible distortion of a compressed video image while still providing a desired output data rate. The parameter which may be varied to best advantage such as, for example, the threshold versus the quantizing factor, further varies as the data rate changes as a function of the information content of the image. Since the information content thereof changes, different sources of data and to a lesser degree different images are optimally quantized by different strategies. The distortion problem is particularly acute in many television applications in which reprocessed image quality is important. It also is acute in most of such applications that require multiple generations of compression, that is, multiple compression/decompression cycles, be made without noticeable degradation.
A basic consideration in the process of compressing data is that of bit allocation; that is, the determination of how many bits are allotted to each coefficient in a block of, for example, the cosine transform coefficients of previous mention. There are a given number of bits available for encoding, for example, a field of video, based on a channel bandwidth or on a capacity for storing the encoded data. Thus, a field of video lasts 1/60 of a second and is derived by scanning a pixel array of the order of 720 by 244 pixels. This is the source data rate. The object of the compression process is to reduce this data rate to fit it into a channel having a bit rate of a selected number of bits per second capacity, that is to fit it into a prescribed channel data space. Thus, for example, when dealing with cosine transform coefficients, a determination must be made concerning the proportion of available bits to allot to the low frequency coefficients of the field of video as well as to the high frequency coefficients. To this end, there are algorithms available for providing such bit allocations which allot the appropriate number of bits to the signals in proportion to their complexity. Relatively few bits are allotted to signals with little energy, with more bits allotted to signals of more energy, thereby minimizing the distortion in the image. An example of such an algorithm may be found in the article, Block Quantization of Correlated Gaussian Random Variables, Y. Hwang, et al., IEEE Trans. on Communication Systems, pp. 289-296, September, 1963.
Data representing images typically have more than one component defining the image at any particular location. For example, color images in a color television system consist of three components for each spatial location; a luminance component and two color difference components. There tends to be more detail in the luminance component, and the color difference components tend to define low color contrast. This is generally true in pictures of naturally occurring scenes. On the other hand, computer generated images such as provided in graphics systems may have strong detail and high contrast in the color difference components. Thus, in the process of bit allocation, if the analysis of how many bits are allotted to luminance and chrominance is based on natural scenes, a compression process performed on a graphics image will generally result in insufficient bits being allotted to the more complex chrominance.
The typical compromise made is that each component of the luminance and color difference components is assigned a certain fixed number of bits. In general, since the luminance contains more information than chrominance in the average natural scene, the greater proportion of total bits available per pixel are allotted to the luminance component than to the chrominance component. In such a situation, if the chrominance component with smaller bit allocation has high contrast and detail, as occurs frequently in the computer generated graphics of previous mention, the rendition of that component will be poor, resulting in poor overall image quality.
Variable length coding of compressed data such as performed in Huffman coding, requires that the degree of compression be adjusted so that the resulting coded data just fills the fixed data space available for it in the channel to be used. In the case of color television image data compression, in which the data has been transformed prior to encoding as, for example, by means of the discrete cosine transform process (DCT), the degree of compression may be controlled by selecting a quantizing factor which is applied to the amplitude coefficients of the frequency terms of the transformed image before encoding.
When separate amounts of bits within the data format, that is, data space, have been allocated to the luminance and chrominance components of the television signal, the quantizing factor must be determined for each component independently. Since the data space must be allocated in such a way as to provide for adequate performance in the "worst-case" situation of each component, situations frequently arise in which one component is distorted because it needs more data space, while the other component under-utilizes its space due to lower, or zero, needs in that particular image. A common case is when the image is of low color saturation or even uncolored such as in a monochrome image. Overall performance then is better if most or all of the chrominance data space is made available for luminance. Conversely, computer generated images with high chrominance content are best served if some luminance space is borrowed for use by the chrominance component, because a larger proportion of energy is concentrated in the chroma component. Thus the visual effect of noise in the chrominance is more visable due to an insufficient data allocation to the chroma component. That is, insufficient data space allocation results in more compression of the chrominance component by coarser quantization, which, in turn, results in increased distortion.