This invention relates to image encoding and, more particularly, to high efficiency image encoding using an orthogonal transform, such as the discrete cosine transform, to produce a sync block of encoded image data, wherein the sync block is of a fixed length and exhibits maximum data volume.
Various encoding techniques have been developed for compressing image information, such as video information, for the purpose of transmitting image data or recording image data, such as by magnetic recording. Among these compression techniques are predictive coding, transform coding and vector quantization.
In transform coding, two perpendicular axes are used to transform image data samples, and the amount of data needed to represent the original image information, referred to as "data volume", is reduced by relying on uncorrelated data. For orthogonal transform coding, the basic vectors are, of course, perpendicular to each other and the mean power of the conversion coefficients that are produced by orthogonal transform is substantially equal to the mean signal power of the image data that is presented prior to transform.
The conversion coefficients produced by orthogonal transform are known to have a DC component and several AC components, and the low frequency AC components generally exhibit a higher power concentration than the high frequency AC components. This permits the higher frequency components of the conversion coefficients to be ignored, thus reducing the data volume needed to represent the original image, without serious degradation in the image that is reproduced from these conversion coefficients. Examples of orthogonal transform techniques include the Hadamard transform, Karhunen-Loeve transform, slant transform, discrete sine transform and discrete cosine transform. The use of discrete cosine transform has become quite advantageous, and one example of the use thereof is described in U.S. Pat. No. 5,006,931, assigned to the assignee of the present invention.
In discrete cosine transformation (sometimes referred to as DCT), an image, or more properly samples representing an image, is divided into several image blocks with each block consisting of n samples arrayed in the horizontal direction and n samples arrayed in the vertical direction. That is, each block is formed of a spatial array consisting of n.times.n samples. The image data samples in each image block are processed by an orthogonal transform using the cosine function. The development of fast processing algorithms implemented on a single chip LSI circuit has enabled real time discrete cosine transformation of image data; and it now is not uncommon for DCT to be used for the transmission and/or recording of image data. Indeed, the discrete cosine transform yields an encoding efficiency that is practically equal to that of the Karhunen-Loeve transform which, in theory, is most favorable. The power concentration of the lower frequency components of the conversion coefficients produced by the discrete cosine transform is practically the same as that of the Karhunen-Loeve transform which, as is known, directly affects coding efficiency. By encoding only those components of the conversion coefficients having concentrated power, the amount of information, or data (i.e. the data volume) which need be transmitted or recorded for accurate representation of the original image is significantly reduced.
As an example of discrete cosine transformation, let it be assumed that an 8.times.8 block of image data samples is represented as follows:
______________________________________ 139 144 149 153 155 155 155 155 144 151 153 156 159 156 156 156 150 155 160 163 158 156 156 156 159 161 161 162 162 155 155 155 161 161 161 161 160 157 157 157 162 162 161 163 162 157 157 157 162 162 161 161 163 158 158 158 ______________________________________
in which each number in this block represents the magnitude or signal level of the image data sample. When the discrete cosine transform of the 8.times.8 block of image data samples is derived, conversion coefficients C.sub.ij (i represents row number and j represents column number) are produced as follows:
______________________________________ 314.91 -0.26 -3.02 -1.30 0.53 -0.42 -0.68 0.33 -5.65 -4.37 -1.56 -0.79 -0.71 -0.02 0.11 -0.33 -2.74 -2.32 -0.39 0.38 0.05 -0.24 -0.14 -0.02 -1.77 -0.48 0.06 0.36 0.22 -0.02 -0.01 0.08 -0.16 -0.21 0.37 0.39 -0.03 -0.17 0.15 0.32 0.44 -0.05 0.41 -0.09 -0.19 0.37 0.26 -0.25 -0.32 -0.09 -0.08 -0.37 -0.12 0.43 0.27 -0.19 -0.65 0.39 -0.94 -0.46 0.47 0.30 -0.14 -0.11 ______________________________________
in which the number representing each conversion coefficient represents the relative power of that conversion coefficient. The conversion coefficient C.sub.00 is referred to as the DC component and represents the mean luminance value of the image block. It is seen that the electric power of the DC component is significantly higher than that of the other components which are known as AC components. As i increases, the frequency of the AC components in the vertical direction increases and as j increases, the frequency of the AC components in the horizontal direction increases. As both i and j increase, the frequency of the AC components in the diagonal direction increases.
The DC component of the conversion coefficients exhibits the largest value and, thus, contains the most information. If the DC component is quantized with a large quantizing step, that is, if it is subjected to coarse quantization, block distortions are produced which appear as noise that is visually detected most readily in the video picture ultimately reproduced from the conversion coefficients, thereby deteriorating the quality of that picture. Consequently, to minimize such visual noise, the DC component of the conversion coefficients, namely C.sub.00, is quantized with a small quantizing step and is represented by a larger number of bits, such as eight or more bits. A lesser number of bits may be used to represent the higher frequency AC components of the conversion coefficients C.sub.ij (where i, j.noteq.0) because higher frequency AC components represent changes in the video information of the n.times.n block and the human eye does not readily detect detail in a rapidly changing image. Consequently, an observer will not sense a loss of detail in that portion of an image which changes from point to point. Therefore, it is not necessary to represent the higher frequency AC components of the conversion coefficients with a large number of bits. This means that a larger quantizing step can be used to quantize the higher frequency AC components of the conversion coefficients. An example of quantizing the conversion coefficients set out above is as follows:
______________________________________ 315.00 0.00 -3.00 -1.00 1.00 0.00 -1.00 0.00 -6.00 -4.00 -2.00 -1.00 -1.00 0.00 0.00 0.00 -3.00 -2.00 0.00 0.00 0.00 0.00 0.00 0.00 -2.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 -1.00 0.00 -1.00 0.00 0.00 0.00 0.00 0.00 ______________________________________
in which the quantizing is analogous to "rounding off" the conversion coefficients.
In a practical transmission or recording scheme, the quantized conversion coefficients are encoded by variable length coding, such as Huffman coding or run-length coding which provides further data compression For proper transmission or recording, additional signals, such as synchronizing signals, parity codes, and the like, are added to the variable length coded conversion coefficients.
In digital recording, such as a digital video tape recorder (DVTR), the amount of data which is recorded to represent a vertical interval, such as a field interval or a frame interval, preferably is of a fixed length. That is, although the data representing a particular block of image data samples may be variable, the total amount of data used to represent a predetermined number of those blocks is fixed. If a predetermined number of image blocks is included in a sync block, then although the amount of data (or data volume) of one image block may be less than that of another, the data volume of all sync blocks is substantially constant Since the data volume of a particular image block is determined by the conversion coefficients produced for that block, the conversion coefficients of some image blocks included in a sync block may be quantized with a higher quantizing step than the conversion coefficients of other image blocks. Of course, when a larger quantizing step is used, less data is produced; and as mentioned above, it is not uncommon to quantize the higher frequency AC components of the conversion coefficients with larger quantizing steps than are used to quantize the lower frequency AC components Accordingly, if several different quantizers are connected in common, each exhibiting a different quantizing step, one quantizer may be used to quantize the conversion coefficients of one image block and another may be used to quantize the conversion coefficients of another image block As a result, the overall data volume of the sync block can be optimized without exceeding the data volume capacity, or preset length, of the sync block. However, when different quantizers are selected for different image blocks, the identity of the quantizer which is used for a particular block must be transmitted or recorded This identifying data does not represent useful image information and, thus, tends to increase the "overhead" in a sync block. This is an undesirable byproduct of selecting different quantizers in order to optimize data volume.
To avoid the aforementioned "overhead", it is preferred to use the same quantizer having the same quantizing step for all of the n.times.n image blocks which are included in a sync block. It is expected that some image blocks will contain more changes in the image therein than others. Thus, the higher frequency AC components of the conversion coefficients in some image blocks will have a higher power concentration than in other image blocks. If these changes in the image data are referred to as the "visual activity" of the image block, then those image blocks having a higher visual activity will have a smaller concentration of lower frequency AC components of the conversion coefficients. Since, as mentioned above, the detail in that portion of an image containing rapid changes, that is, exhibiting a high visual activity, is not readily perceived, an image block having high activity can be quantized with a larger quantizing step without producing notable picture degradation. However, if an image block in the same sync block has low visual activity, that is, if the image block represents a monotonic picture pattern with a small dynamic range, the conversion coefficients are concentrated in the lower frequency AC components. If these conversion coefficients are quantized with a large quantizing step, that is, if they are subjected to the same coarse quantization that can be used for the high activity image blocks in that sync block, block distortions resulting in perceptible deterioration in the reproduced image is produced. Thus, although it is desirable to select a quantizing step that is uniform for all of the image blocks included in a sync block, if this quantizing step is too large, that portion of the image which does not contain high visual activity, that is, those image blocks which do not contain many changes, will be quantized into a number of bits that is not sufficient to represent that portion of the image properly. On the other hand, if a quantizer is selected with a relatively small quantizing step, then an image block which contains high visual activity, that is, a large number of changes therein, will be represented by an unnecessarily large number of bits which is inefficient and which may exceed the overall limit on the data volume in the sync block.