In the processing of images, the spatial representation of an image in, e.g., two-dimensions, is typically transformed into a signal representing the image in a different data space. Included among the reasons for this transformation is to compress or minimize the amount of digital data required to adequately represent the image. Reduction of the data requirements enhances the speed at which the data can be communicated along fixed bandwidth communications channels as well as reduces the amount of memory required to store the image. Much of the following background discussion is taken from the excellent article by Wallace, G., "The JPEG Still Picture Compression Standard," Communications of the ACM, April, 1991, pp. 30-43, which is incorporated herein by reference.
In the basic case of a two dimensional grayscale image, the continuous (in space and amplitude) grayscale intensities across the two dimensions of the image are converted to an array of discrete pixels, each having a discrete intensity chosen from a set of typically fixed quantization levels. A typical digital image is made up of an array of n.sub.1 by n.sub.2 (e.g. 512 by 512) pixels, each one quantized to 256 intensity levels (which levels can be represented by eight digital signal bits. Each level is commonly denoted by an integer, with 0 corresponding to the darkest level and 255 to the brightest.
The sampling of the image can be conducted according to many techniques. A basic technique is to simply divide the entire gray scale intensity level into 256 levels, to measure the intensity at each of n.sub.1 locations across the image and for n.sub.2 rows down the image, and to assign to each of the pixels in the array the one of the 256 levels that is closest to the actual measured intensity.
The result of the sampling is a digital image, that can be represented by an array of n.sub.1 by n.sub.2 integers with amplitudes between 0 and 255. Technically, to transmit the image, it is possible to simply transmit this data in a stream, one row after another. However, such treatment would require transmission of a large amount of data (eight bits per pixel). It is well know that significant portions of images are represented by pixels that are similar to each other, and that are correlated with each other. Significant savings in data requirements can be achieved by well-known techniques that exploit the redundancy that arise in many images.
A well known technique is to transform the image from a spatial domain, i.e. a domain where an intensity amplitude is correlated to a spatial location in the n.sub.1 and n.sub.2 dimensions, to another domain, such as a spatial frequency domain. In the spatial frequency domain, the image is represented by amplitudes at spatial frequencies. Spatial frequencies can be considered to be related to the degree of change in intensity from one pixel to the next along either the horizontal or vertical dimensions (or combinations thereof). For instance, an image of vertical bars of a certain width alternating from black to white will be characterized by a set of spatial frequencies. For a similar image with narrower bars, the set of spatial frequencies differs indicating the higher frequency of change from one pixel to the next. Similarly, variations in the vertical dimension also bring about similar changes. Coding techniques which transform the spatial image into another domain, such as a frequency domain, are generally referred to as "transform" coding techniques.
A typical transform coding technique is known as the DCT or discrete cosine transform technique. The name derives from the fact that a cosine function is applied to signal elements that are discrete in space, rather than a spatially continuous signal.
DCT compression can be thought of as the compression of a stream of t total sample blocks, 102.sub.1, 102.sub.2, 102.sub.3, . . . , 102.sub.n, . . . 102.sub.t, of the original image 104, shown schematically in FIG. 1, in n.sub.1 and n.sub.2. Each block 102.sub.n is made up of an array of contiguous pixels 106. A typical block size is an 8 by 8 pixel block, for a total of 64 pixels. In the case shown in FIG. 1, t=4,096.
FIG. 2 shows the essential steps of a transform coding technique, such as a DCT for a single component, such as a gray scale amplitude. Each 8.times.8 block 102.sub.n is input to the encoder 120. The data is passed to a DCT operator 122. In the DCT 122, the transform function is applied to the data in each block 102.sub.n according to the following well known relation: ##EQU1##
In the above expression, F(k.sub.1,k.sub.2) is the transformed function in the variables k.sub.1 and k.sub.2 and f(n.sub.1,n.sub.2) is the amplitude pattern of the original function in the block 102.sub.n as a function of n.sub.1 and n.sub.2, the spatial dimensions.
The DCT 122 can be considered as a harmonic analyzer, i.e. a component that analyzes the frequency components of its input signal. Each 8.times.8 block 102.sub.n is effectively a 64 point discrete signal which is a function of the two spatial dimensions n.sub.1 and n.sub.2. The DCT 122 decomposes the signal into 64 orthogonal basis signals, each containing one of the 64 unique two-dimensional (2-D) "spatial frequencies," which comprise the input signal's "spatial spectrum." The output of the DCT 122 is referred to herein as a "transform block" and is the set of 64 basis-signal amplitudes or "DCT coefficients" whose values are uniquely determined by the particular 64 point input signal, i.e. by the pattern of intensities in block 102.sub.n. Although the DCT transform is being used herein for explanatory purposes, other transforms, including sub-band transforms are applicable and may be used in conjunction with the invention.
The DCT coefficient values can thus be regarded as the relative amounts of the 2-D spatial frequencies contained in the 64 point input signal 102.sub.n. The coefficient with zero frequency in both dimensions n.sub.1 and n.sub.2 is called the "DC coefficient" and the remaining 63 coefficients are called the "AC coefficients."
Experience has shown that with typical images 104, sample values from pixel to pixel typically vary slowly across an image. Thus, data compression is possible by allocating many of the available bits of the digital signal to the lower spatial frequencies, which correspond to this slow spatial variation. For a typical source block 102.sub.n, many of the spatial frequencies have zero or near-zero amplitude and need not be encoded. As shown schematically in FIG. 3, a typical graphical representation of a transform block 112.sub.n, which is the result of transforming a block 102.sub.n, by DCT, zero amplitude coefficients are indicated by white and non-zero amplitude coefficients are indicated by shading. A small shaded region 114 centered around the origin is surrounded by a large white region 116. This type of pattern typically arises for every 8.times.8 block.
After transformation in the DCT 122, Several other operations take place in quantizer 124 and encoder 126, which operations are explained below in more detail. It is appropriate to mention these operations now, so that the encoder will be fully understood. Basically, these operations further transform and encode the transformed signal F(k.sub.1, k.sub.2) output from the DCT. From the DCT encoder 120, the signal is communicated over a communication channel, or stored, or otherwise treated. Eventually the compressed image data is input to DCT decoder 130, which includes a decoder 136, a dequantizer 134 and an inverse DCT ("IDCT") 132. The dequantizer 134 and decoder 136 reverse the effects caused by encoder 126 and quantizer 124, respectively. The output of dequantizer 134 is thus a function in the transformed data space (k.sub.1, k.sub.2) and typically is in the form of an 8.times.8 block also. In the IDCT 132, an inverse transformation is applied to the data in each transform block. The transformation is designed to undo the effects of the forward transformation set out above, and its characteristics are well known to those of ordinary skill in the art.
The output of the IDCT is a reconstruction of an 8.times.8 image signal in the spatial dimensions of n.sub.1 and n.sub.2, generated by summing the basis signals. Mathematically, the DCT is a one-to-one mapping of 64 values between the image and the frequency domains. If the DCT and IDCT could be computed with perfect accuracy and if the DCT coefficients were not quantized as in quantizer 124 (and thus subsequently dequantized), the original 64 point signal 102.sub.n could be exactly recovered. In principle, the DCT introduces no loss to the source image samples; it merely transforms them to a domain in which they can be more efficiently encoded.
Returning to the description of the encoding stage of the coding process, after output from the DCT 122, each of the, in this case, 64 DCT coefficients, is quantized with reference to a quantization, or reconstruction level table. One purpose of quantization is to discard information that is not visually significant. Another purpose of quantization is to obviate the need for an infinite number of bits, which would be the case if quantization were not used. In quantization, each coefficient is compared to the entries in the quantization table, and is replaced by the entry that is closest to the actual coefficient. Assuming for explanation purposes only that uniform length codewords are used. If 8 bits are used, then each coefficient can be quantized into the closest of 256 quantization or reconstruction levels. If only 7 bits are used, then only 128 reconstruction levels are available. Typically, however, the codewords will not be of uniform length.
The output of the quantizer 124 can be normalized. At dequantizer 134, the effect of any normalization is reversed. The output from the dequantizer are the quantized values set forth in quantization table used by quantizer 124.
If the aim of a particular signal processing task is to compress the image as much as possible without visible artifacts, each step size in the quantization table should ideally be chosen as a perceptual threshold or "just noticeable difference" for the visual contribution of its corresponding cosine basis function. These thresholds are also functions of the source image characteristics, display characteristics and viewing distance. For applications in which these variables can be reasonably well defined, psychovisual experiments can be performed to determine the best thresholds. Quantization is a many-to-one mapping, and therefore it is fundamentally lossy. It is the principal source of lossiness in DCT-based encoders.
After quantization, the DC coefficient is typically treated separately from the 63 AC coefficients, although, for purposes of the present invention, the DC component can also be considered together with the other coefficients. The DC coefficient is a measure of the average value of the 64 image samples. Because there is usually strong correlation between the DC coefficients of adjacent source blocks 102.sub.n, the quantized DC coefficient is typically encoded as the difference from the DC term of the previous transform block in the encoding order. The encoding order for blocks 102.sub.n is typically a sinuous path along the n.sub.1 direction, moving up one block at the end of a row of blocks, and so on, until the final block in the final row is reached. This is shown by the path S in FIG. 1. Other block encoding orders are also possible, such as a vertical sinuous pattern, or a sinuous pattern that has increasingly longer runs extending generally at a 45.degree. angle to both axes. The differential treatment of the DC coefficients is beneficial because the difference between DC values takes up less energy than the DC values themselves, so fewer bits are required to encode the differences than to encode the DC values themselves.
After the DC coefficients have been coded, the set of AC coefficients are ordered, typically, in a zig-zag pattern, as shown in FIG. 4. This ordering begins with the low spatial frequency components, in both the k.sub.1 and k.sub.2 dimensions, and proceeds to the higher frequency components. As has been mentioned, the transformed images are typically characterized by relatively many large amplitude low frequency components and relatively many small amplitude higher frequency components. This zig-zag ordering helps in the case of runlength encoding to implement a coding of the locations in the transformed domain of the large and small amplitude coefficients.
Typically, the coefficient amplitude is compared with a threshold. Above the threshold, the amplitude is selected to be coded. Below the threshold, the amplitude is not selected to be coded and is set to zero. For discussion purposes, coefficients having amplitudes below the threshold are referred to as "non-selected" coefficients and sometimes as coefficients having zero amplitude, while coefficients having amplitudes above the threshold are referred to as "selected" coefficients, or as having non-zero amplitude. It is understood, however, that many of the so-called zero amplitude coefficients have small, non-zero amplitudes.
Because many of the coefficients are not selected, it is more efficient to code the location information by identifying which of the 63 AC coefficients are to be selected, and the value of their quantized coefficient, rather than digitally coding a coefficient value for each of the 64 spatial frequencies. The coefficients are identified by their locations in the ordered set defined by the transformed domain. Two techniques are known for coding which of the coefficients are non-zero. One technique is referred to as "runlength coding" and the other is referred to as "vector-quantization." Runlength encoding exploits the ordering imposed on the coefficients by the zig-zag pattern while vector-quantization exploits only the ordering inherent in the dimensional arrangement of the transformed domain.
According to the method of runlength encoding, the positions along the zig-zag path which have a selected coefficient are specified, based on their location relative to the previous selected coefficient. As shown in FIG. 5, in transformed block 112.sub.n, six AC coefficients are selected. These are the second, fifth, sixteenth, seventeenth, twenty-fourth and fortieth along the zig-zag path, not counting the DC coefficient. One way to encode this, is to transmit a digital codeword that signifies the number of non-selected coefficients between selected coefficients (and thus the locations of the selected coefficient), along with a stream of codewords that signify the quantized amplitudes of the selected coefficients.
For instance, in FIG. 5, an intermediate codestring that signifies the number of non-selected coefficients between selected coefficients would be 1,2,10,0,6,15,23 (or, more typically, "end of block"). Starting with the DC coefficient as an origin, before the first selected coefficient (the second in the string) is one. The number of intervening non-selected coefficients between the first selected coefficient and the second (the fifth in the string) is two. The number between the second and the third (the sixteenth) is ten. The number between the third and the fourth (the seventeenth) is zero, and so-on. After the last selected coefficient, twenty-three non-selected coefficients arise before the end of the block. It is possible to signify this run of twenty three non-selected coefficients with the codeword "23". Alternatively, and more efficiently, a special codeword is allotted for the end of the block, to signify that the rest of the coefficients (after the sixth selected coefficient) are non-selected. (Another method, rather than counting the number of intervening non-selected coefficients, is to specify the location of the next selected coefficient. Both methods amount to basically the same thing.)
This intermediate codestring (1,2,10,0,6,15, end of block) must be digitally represented, typically simply by coding in binary for each runlength. The shortest possible runlength is zero and the longest possible runlength is sixty-three, so according to the simplest scheme, six bits are necessary to specify each runlength. Other schemes, using a variable length codeword to specify each runlength are known as "entropy" coding, and are discussed below.
The foregoing coding only codes for the positions of the selected coefficients along the zig-zag path. It is also necessary to code the amplitudes of the coefficients. (The invention only relates specifically to coding the location information.) Typically, the information that identifies the locations of the selected coefficients takes up more than one-half of the data that must be transmitted. As has been mentioned, for instance in the special case of uniform codewords discussed above, the amplitudes of the coefficients have been quantized into codewords of uniform word length, in the case discussed, 8 bits. Thus, according to one technique, if the first codewords to be sent are those which code for the position (and quantity) of the selected coefficients, a second stream of codewords can be sent for the amplitudes. Thus, in a most rudimentary method, for the example shown in FIG. 5, it would be necessary to send seven codewords for the locations (including one for "end of block") and six codewords for the amplitudes, for a total of thirteen codewords.
In entropy coding, an advantage is gained from recognition of the fact that in typical images, the probability that a certain length of run will arise varies, depending on the length. For instance, a run of sixty-two is very common, signifying a single selected coefficient in the first position, with all other coefficients being non-selected. Similarly, other long runs are more likely than runs of moderate length, such as thirty-three. Runs of very short length, e.g. zero, one and two, are also highly likely, because selected coefficients tend to be clustered at the beginning of the zig-zag pattern. Thus, in entropy coding, an estimation is made of the probability that a certain value, in this case a runlength, will be the value that is desired to be coded, and the values are ordered according to probability, from highest to lowest. Next, a set of codewords (also known as a "codebook") is developed with codewords of different lengths (number of bits). The codewords having the shortest lengths are assigned to the runlengths of highest probability, and the codewords of longer lengths are assigned to the runlengths of lesser probability. Thus, it will typically take fewer bits to specify a series of runs, because the most probable runs are specified with the shortest codewords.
One complication of entropy, or variable length codeword coding arises from the fact that the codewords are not all the same length. The decoder must have some way of identifying when in a stream of "1"s and "0"s one codeword ends and the next begins. When all of the codewords are the same length, the decoder simply starts analyzing a codeword at the end of a fixed number of bits. Several techniques for variable length coding are practiced. Typical are Huffman coding and arithmetic coding. Huffman coding is discussed in detail in Lim, J. S., Two-Dimensional Signal and Image Processing, Prentice Hall, Englewood Cliffs, N.J. (1990), pp. 613-616, and Huffman, D. A., "A method for the construction of minimum redundancy codes," Proceedings IRE, vol. 40, 1962, pp. 1098-1101, both of which are incorporated herein by reference. Arithmetic coding is discussed at Pennebaker, W.B., Mitchell, J. L., et al., "Arithmetic coding articles," *IBM J. Res. Dev. 32, 6 Special Issue (Nov. 1988), 717-774, which is incorporated herein by reference. Only Huffman coding is discussed herein, and that only cursorily.
According to a Huffman coding scheme, a codebook is established such that as a stream of bits is analyzed, it is unambiguous when a codeword has been completed. For instance, a typical stream of bits could be 100011011101011111. The codebook from which this stream of code was constructed is shown schematically in FIG. 6b. For explanatory purposes, this codebook is very small, including only six entries. However, it must be understood that the codewords listed are the entire set of codewords for this codebook. In other words, neither 1, 111, nor 10 are valid codewords. Thus, analyzing the string of bits, from left to right, the first bit is "1", which is not a codeword. Combining the next bit to have "10", is still not a valid codeword. Combining the next bit produces "100", which is a valid codeword, signifying the runlength of 1. Starting over again, the next bit is "0", which is a valid codeword and signifies a runlength of 0. Starting over again, the next bit is 1, not a valid codeword, followed by another 1, producing "11", still not a valid codeword. Appending the next "0" produces "110", the codeword for a run of length 2. Applying the same process for the rest of the string shows that the order of the runlengths was 1, 0, 2, 4, 3, 5.
Use of Huffman coding will result in codewords of very long lengths, for those very rare runlengths. If the table is properly constructed, however, it will result in the use of short codewords most often. Huffman coding requires that the probabilities of various runlengths be known, or at least estimated to some reasonable certainty. It also requires the storage of the codebook and accessibility to it by both the encoder 126 and decoder 136, which is not required if the codebook is simply the digital representation of the runlength, as set forth in FIG. 6a. In that case, no codebook per se is necessary, although it is still necessary to provide instructions for both the encoder and the decoder to use that sort of a coding scheme.
Analysis of the runlength method of encoding the location of selected coefficients reveals that it is highly efficient when the length of the runs are long, and it is only necessary to specify the location of a few coefficients. However, when the lengths of runs are short, and it is necessary to specify the location of relatively many coefficients, runlength encoding is not very efficient, since both the location and the amplitudes must be specified for many values. Additionally, for large blocks, such as an 8.times.8 block, there is a significant difference in the probability that a certain run will arise beginning on the second coefficient, as compared to beginning on the thirtieth component. For instance, the probability that any runs of longer than thirty-four will start on the thirtieth coefficient is zero, however, the typical Huffman coding does not take this into account, and allocates the shorter codewords to those runlengths having the highest probability of starting at a coefficient near to the low frequency region of the block.
Another method of encoding the locations of selected coefficients is also used. This method is not used in conjunction with run-length coding. This method is known as "vector quantization," because it treats the representation of the transform block 112.sub.n shown in FIG. 5 as a two-dimensional vector. Rather than specifying the differences between locations of selected coefficients, vector quantization specifies the pattern of selected coefficients. Each possible pattern is given a unique codeword. The actual pattern is matched to a list of patterns, and the codeword is transmitted.
For instance, as shown in FIG. 7a, the pattern of selected coefficients is different from the pattern set forth in FIG. 5. Similarly, the pattern in FIG. 7b differs from both. For an 8.times.8 block, there are 2.sup.64 or approximately 1.8.times.10.sup.19 different patterns, a very large number. It is possible, although burdensome, to specify all of the possible patterns in the codebook, as shown schematically in FIG. 8, and to assign to each of the 2.sup.64 patterns a unique 64 bit codeword. Thus, to transmit the coded patterns only requires sending a single, in this case 64 bit codeword, rather than the several codewords typically necessary to specify a pattern using runlength coding. Independent of how many selected coefficients there are, there will be only a single codeword transmitted to specify their pattern, and thus their location. This is an equivalent method to giving one bit to each coefficient, just to indicate if the amplitude of that coefficient is non-zero or not. The codebook must be stored and accessible to both the encoder 126 and the decoder 136. The amount of memory required to store the codebook is very large, given the large number of patterns and the high number of bits necessary to code each pattern.
As in the case of runlength coding, it is possible, and more typical, to use a variable length codeword codebook, rather than the uniform codeword length book shown schematically in FIG. 8. In that case, the probability of different patterns is determined, or estimated, and patterns of highest probability receive codewords using the least number of bits. Thus, to specify highly probable patterns, a codeword of only a few bits must be sent, rather than the 64 bits of a uniform length codeword system, or the number required by runlength encoding. It will also be understood that because some possible strings of bits will not be valid codewords, due to the necessity to specify the boundaries between codewords of variable lengths, the codewords for the least probable patterns will require many more than 64 bits, for example, perhaps as many as 160.
Thus, although vector quantization may result in a very efficient coding of most patterns, it requires a very large table to be generated and stored to have the capability of encoding all patterns, thus requiring that the apparatus have a very large amount of memory and computations, impractical for many situations.
It is possible to break up the 8.times.8 coefficients into several segments, but this results in a loss of efficiency because dividing the block into regions makes it difficult to exploit the correlation between regions.
Thus, the several objects of the invention include to provide a method and an apparatus for the encoding of one-dimensional and multi-dimensional signals, including the position within a block of selected coefficients, such as transform coefficients: which does not require an inordinately high amount of memory for the storage of codewords, which facilitates codewords of a relatively small bit rate; which facilitates assigning different codewords to the same runlength depending on the location of the starting coefficient of the run; which is more efficient in terms of bits used per coefficient than runlength encoding; which requires less computation and less use of memory than vector quantization; and which can be implemented straightforwardly.