This invention relates to the real-time compression of digital video signals suitable for the transmission of digital data either through a communications channel, or for recording and playback on a magnetic tape recorder or other recording medium. More particularly, the present invention relates to a technique for shuffling of image data to equalize the information content thereof prior to compressing the data into fixed code block lengths, which then may be recorded in, and recovered from, a recording medium.
In general, the goal of data compression is to send digital information from one point to another through a transmission channel using the least transfer of data as is possible. In other words, the object is to eliminate the transmission of as much redundant data as is possible. Video images, by their very nature, contain a great deal of redundancy and thus are good candidates for data compression. It is also well known that on a statistical basis, a straight-forward digital representation of an image necessarily contains redundancy both in a spatial sense and in a temporal sense. By removing a portion of the redundancy from the image data at a transmitter, the amount of data transmitted over a communications channel or recorded on a storage medium may be substantially reduced. The image then may be reconstructed at the receiver or, if recorded, in the recorder playback electronics, by reintroducing the redundancy removed at the transmitter. Incidentally, the expression "image data" as used herein refers to data defining an image to be displayed in a two-dimensional spatial image, which may also take the form of a time varying image composed of multiple frames which are equally spaced in time. The exact form or structure of "image data" can take any of a variety of well-known forms, and in this regard it should be noted that the present invention is broadly applicable to any type of signal representing a two-dimensional space. For example, such a signal might be a moving scene derived from a video camera.
From a very general perspective, there are two classes of data compression: lossless compression and lossy compression. Lossless compression, as the name implies, allows the original data to be exactly reconstructed after being compressed without any loss of information. Whereas, lossy data compression is an irreversible process which introduces some amount of distortion into the compressed data so that the original data cannot be exactly reproduced. In order to obtain large compression factors for images, it is necessary to use lossy compression methods of the type described herein. Lossy compression may be an acceptable alternative as long as the amount and type of distortion produced in the reconstructed image are not objectionable. However, what is deemed "objectionable" in one industry is not so in another. For example, in the professional video industry where the signal-to-noise ratio typically requires 50 decibels (dB) or better, the reconstructed image must be virtually indistinguishable from the original, i.e., any more than 2 or 3 dB of signal impairment is objectionable since it is noticeable to viewers of a video display.
Image compression for use in conjunction with digital video tape recorders has several unique requirements which impose additional constraints on any compression method used. The unusual constraints arise from the typical modes of use of a video tape recorder, and from the fact that the data must be stored for later use rather than immediately transmitted. For example, a tape recorder must allow editing of the recorded information. Practically, this means that the stored data for one field occupy an integral number of tracks on the tape or occupy defined blocks of video data, such as a television field, at predictable locations or tracks on the tape, (especially for editing purposes). This imposes the constraint that a field of data be constant in length. Such a seemingly simple constraint places a severe design requirement on any compression scheme. Because most images statistically are non-stationary (that is, the statistical distribution of the information content varies as a function of position within the image), the obvious solution to compressing a digital signal having varying information content would be to allow the encoded data rate to vary on a frame-by-frame or field-by-field temporal basis according to the information content of the image. But because of editing requirements, the encoded data rate must be fixed rather than variable. In the edit mode, the replacement of recorded information by new information requires that the smallest unit of information to be replaced (in television signals this is a single field) be allotted a fixed data block length in the recorded data format. This allows any unit of a video signal to be replaced with any equally sized unit of a video signal.
Video tape recorders for television broadcast applications must also allow pictures to be reproduced at higher than normal record/playback tape transport speeds (picture in shuttle). At the exceedingly higher playback speeds associated with the picture in shuttle mode, only a fraction of the data on each track is recovered. This requires that the compressed recorded data be stored in small complete data blocks from which a most significant portion of the picture may be recovered even at the higher speed.
To maintain maximum efficiency in recording and to minimize gaps for record over-runs, it is best to use a record format which has a fixed short period related to the original uncompressed information. This simplifies the design of the data deformatter by providing a regular and expected structure for the data stream recovered from tape. This regular structure allows "intelligent" deformatting of the data because certain patterns may be identified as errors and ignored.
Heretofore, various digital video compression studies have focussed on the two-dimensional discrete cosine transform (the DCT) for use as the preferred adaptive coding vehicle, due to its energy compaction properties and relative ease of implementation with digital circuits. (See "Discrete Cosine Transform," IEEE Transaction on Computers, Vol. C-23, Pgs. 90-93, Jan. 1974.) To perform a transformation on a video image, the image is first divided into blocks of pixels (e.g. 16.times.16 or 8.times.8), and then cosine transformed into a set of transform coefficients, each of which represents a scalar weighting parameter (i.e., a coefficient) for a two-dimensional cosine transform function. In the cosine transform domain, the amplitude coefficients are concentrated at the lower frequency terms, with many of the upper frequencies being zero valued. If the coefficients are coarsely quantized into integral values and then Huffman coded, the number of bits needed to represent the image is greatly reduced. A key factor in making this scheme work effectively is the quantizing process. If the quantization is too fine, the data generated by the Huffman coder will exceed the data rate of the channel (or recorder), while too coarse a quantization results in unacceptable distortion/noise.
One technique for determining a suitable quantization parameter for the required data rate simply monitors an output buffer memory and uses a feedback scheme to adjust the quantization level to maintain an equilibrium of data in the buffer. Thus, in a less complex part of an image, less data enters the buffer memory to decrease the contents, while in a more complex part of the image the buffer input data rate increases to increase the buffer content. This method is described in the article, "Scene Adaptive Coder" by Chen et al., appearing in IEEE Transactions on Communications, Vol. Com. 32, No. 3 (March 1984). It is also described in U.S. Pat. No. 4,302,775. However, in recording processes, methods utilizing buffer fullness do not lend themselves to accurate rate control over small amounts of information, and thus do not enable efficient and accurate editing and picture in shuttle. Bit allocation methods as utilized in the past do not produce the quality of images that are desired if a relatively wide range of different images defined by the data are to be reduced.
In some instances, such as the one described immediately above, a threshold level is applied to the transformed data coefficients. That is, all values below a certain threshold are considered to be zero. This thresholding also is often considered to be quantization, and as used herein the terminology applying a "quantization" or quantizing parameter is meant to include applying a threshold level value, a scaling factor or other numerical processing parameter.
It is generally desirable to vary the quantizing parameters to produce the smallest increase in visible distortion of a compressed video image while still providing a desired output data rate. The parameter which may be changed to best advantage changes as the data rate changes, which is a function of the information content of the image. Different sources of data and to a lesser degree different images are optimally quantized by different strategies since the information content thereof changes. The distortion problem is particularly acute in many television applications in which reprocessed image quality is important. It is also necessary in most of such applications that multiple generations of compression, that is, multiple compression/expansion cycles, be made without noticeable degradation.
In order to circumvent the problems created when attempting to compress data such as, for example, video signals to be recorded via video tape recorders, wherein specific units of video data must fit within an allotted recorded data sync block length, the video image data preferably should not be taken sequentially, nor should the data be successively taken from the same area of the image. In such a selection process, the portions of the image with low complexity are finely quantized, whereas the complex portions of the image are coarsely quantized. The resulting picture is very high in quality in low complexity areas and poor in quality in the complex areas.
By way of illustration it is assumed that an image to be compressed and recorded or transmitted, is a scene of a harbor including the relative complexity of boats, people and shops against a background of ocean and a clear blue sky. If data is taken sequentially, the areas of lower complexity such as the sky or ocean can be encoded with fewer bits, while the area of the scene with the boats and people is of greater complexity and requires a greater number of encoding bits to prevent image distortion. If the encoded data is to be fitted into preselected fixed length sync blocks, such as when recording on tape, but the data is taken sequentially, the areas of sky and ocean are allotted the same number of bits per sync block as are the areas of boats and people. However, since the sky or ocean is less complex, only a small fraction of bits in the sync block are needed by the coded transform coefficients to completely encode the information. The remaining bits of the sync block are simply assigned zeros and therefore are wasted. Further along the process of compressing the harbor scene data, the more complex data corresponding to the boats and people are encoded, and now there are insufficient bits available to encode the data without distortion. That is, the number of bits required to properly encode the more complex portions of the image with minimum distortion, will not fit within the space allotted on the tape. Thus, the complex portions of the image must be coarsely quantized to "force" the information to fit within the fixed sync block length. This is true even though the fixed sync block lengths which contain the less complex sky and ocean image data are finely quantized with much wasted space within each block length. It may be seen that this problem of efficiently apportioning bits to image areas of differing complexities is compounded by supplying the data corresponding to the image in sequential order, as is commonly done.
Thus, it may be seen that there are two conflicting requirements when attempting to use data compression techniques in combination with professional video recorders. On the one hand it is desirable from the standpoint of video recorder design to allocate a sync block a fixed segment on the recording medium. On the other hand it is desirable from the standpoint of efficient data compression to allocate a variable output format to provide an image transmission with minimal distortion. Thus, what is desired is a way of achieving these two seemingly conflicting requirements.