This invention relates to transform coding of digital data, specifically to real domain processing of transform data and, more particularly, to a shift and/or merge of transformed data which increases the speed of, for example, processing of color images printed by color printers. The invention implements an efficient two-dimensional method for merging and shifting JPEG (Joint Photographic Experts Group) images in the Discrete Cosine Transform (DCT) domain. Since each dimension is handled by one-dimensional methods, the shift or merge amounts are independent for the two axes.
Transform coding is the name given to a wide family of techniques for data coding, in which each block of data to be coded is transformed by some mathematical function prior to further processing. A block of data may be a part of a data object being coded, or may be the entire object. The data generally represent some phenomenon, which may be for example a spectral or spectrum analysis, an image, an audio clip, a video clip, etc. The transform function is usually chosen to reflect some quality of the phenomenon being coded; for example, in coding of audio, still images and motion pictures, the Fourier transform or Discrete Cosine Transform (DCT) can be used to analyze the data into frequency terms or coefficients. Given the phenomenon being compressed, there is generally a concentration of the information into a few frequency coefficients. Therefore, the transformed data can often be more economically encoded or compressed than the original data. This means that transform coding can be used to compress certain types of data to minimize storage space or transmission time over a communication link.
An example of transform coding in use is found in the Joint Photographic Experts Group (JPEG) international standard for still image compression, as defined by ITU-T Rec. T.81 (1992)|ISO/IEC 10918-1:1994, Information technologyxe2x80x94Digital compression and coding of continuous-tone still images, Part 1. Requirements and Guidelines. Another example is the Moving Pictures Experts Group (MPEG) international standard for motion picture compression, defined by ISO/IEC 11172:1993, Information Technologyxe2x80x94Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbits/s. This MPEG-1 standard defines systems for both video compression (Part 2 of the standard) and audio compression (Part 3). A more recent MPEG video standard (MPEG-2) is defined by ITU-T Rec. H.262|ISO/IEC 13818-2: 1996 Information Technologyxe2x80x94Generic Coding of moving pictures and associated audioxe2x80x94Part 2: video. A newer audio standard is ISO/IEC 13818-3: 1996 Information Technologyxe2x80x94Generic Coding of moving pictures and associated audioxe2x80x94Part 3: audio. All three image international data compression standards use the DCT on 8xc3x978 blocks of samples to achieve image compression. DCT compression of images is used herein to give illustrations of the general concepts put forward below; a complete explanation can be found in Chapter 4 xe2x80x9cThe Discrete Cosine Transform (DCT)xe2x80x9d in W. B. Pennebaker and J. L. Mitchell, JPEG: Still Image Data Compression Standard, Van Nostrand Reinhold: New York, (1993).
Wavelet coding is another form of transform coding. Special localized basis functions allow wavelet coding to preserve edges and small details. For compression the transformed data is usually quantized. Wavelet coding is used for fingerprint identification by the Federal Bureau of Investigation (FBI). Wavelet coding is a subset of the more general subband coding technique. Subband coding uses filter banks to decompose the data into particular bands. Compression is achieved by quantizing the lower frequency bands more finely than the higher frequency bands while sampling the lower frequency bands more coarsely than the higher frequency bands. A summary of wavelet, DCT, and other transform coding is given in Chapter 5 xe2x80x9cCompression Algorithms for Diffuse Dataxe2x80x9d in Roy Hoffman, Data Compression in Digital Systems, Chapman and Hall: New York, (1997).
In any technology and for any phenomenon represented by digital data, the data before a transformation is performed are referred to as being xe2x80x9cin the real domainxe2x80x9d. After a transformation is performed, the new data are often called xe2x80x9ctransform dataxe2x80x9d or xe2x80x9ctransform coefficientsxe2x80x9d, and referred to as being xe2x80x9cin the transform domainxe2x80x9d. The function used to take data from the real domain to the transform domain is called the xe2x80x9cforward transformxe2x80x9d. The mathematical inverse of the forward transform, which takes data from the transform domain to the real domain, is called the respective xe2x80x9cinverse transformxe2x80x9d.
In general, the forward transform will produce real-valued data, not necessarily integers. To achieve data compression, the transform coefficients are converted to integers by the process of quantization. Suppose that (xcexi) is a set of real-valued transform coefficients resulting from the forward transform of one unit of data. Note that one unit of data may be a one-dimensional or two-dimensional block of data samples or even the entire data. The xe2x80x9cquantization valuesxe2x80x9d (qi) are parameters to the encoding process. The xe2x80x9cquantized transform coefficientsxe2x80x9d or xe2x80x9ctransform-coded dataxe2x80x9d are the sequence of values (ai) defined by the quantization function Q:                                           a            i                    =                                    Q              ⁡                              (                                  λ                  i                                )                                      =                          ⌊                                                                    λ                    i                                                        q                    i                                                  +                0.5                            ⌋                                      ,                            (        1        )            
where └x┘ means the greatest integer less than or equal to x.
The resulting integers are then passed on for possible further encoding or compression before being stored or transmitted. To decode the data, the quantized coefficients are multiplied by the quantization values to give new xe2x80x9cdequantized coefficientsxe2x80x9d (qixe2x80x2) given by
xcexixe2x80x2=qiai.xe2x80x83xe2x80x83(2)
The process of quantization followed by de-quantization (also called inverse quantization) can thus be described as xe2x80x9crounding to the nearest multiple of qixe2x80x9d. The quantization values are chosen so that the loss of information in the quantization step is within some specified bound. For example, for audio or image data, one quantization level is usually the smallest change in data that can be perceived. It is quantization that allows transform coding to achieve good data compression ratios. A good choice of transform allows quantization values to be chosen which will significantly cut down the amount of data to be encoded. For example, the DCT is chosen for image compression because the frequency components which result produce almost independent responses from the human visual system. This means that the coefficients relating to those components to which the visual system is less sensitive, namely the high-frequency components, may be quantized using large quantization values without loss of image quality. Coefficients relating to components to which the visual system is more sensitive, namely the low-frequency components, are quantized using smaller quantization values.
The inverse transform also generally produces non-integer data. Usually the decoded data are required to be in integer form. For example, systems for the playback of audio data or the display of image data generally accept input in the form of integers. For this reason, a transform decoder generally includes a step that converts the non-integer data from the inverse transform to integer data, either by truncation or by rounding to the nearest integer. There is also often a limit on the range of the integer data output from the decoding process in order that the data may be stored in a given number of bits. For this reason the decoder also often includes a xe2x80x9cclippingxe2x80x9d stage that ensures that the output data are in an acceptable range. If the acceptable range is [a, b], then all values less than a are changed to a, and all values greater than b are changed to b.
These rounding and clipping processes are often considered an integral part of the decoder, and it is these which are the cause of inaccuracies in decoded data and in particular when decoded data are re-encoded. For example, the JPEG standard (Part 1) specifies that a source image sample is defined as an integer with precision P bits, with any value in the range 0 to 2Pxe2x88x921. The decoder is expected to reconstruct the output from the inverse discrete cosine transform (IDCT) to the specified precision. For the baseline JPEG coding P is defined to be 8; for other JPEG DCT-based coding P can be 8 or 12. The MPEG-2 video standard states in Annex A (Discrete Cosine Transform), xe2x80x9cThe input to the forward transform and the output from the inverse transform is represented with 9 bits.xe2x80x9d
For the JPEG standard, the compliance test data for the encoder source image test data and the decoder reference test data are 8 bit/sample integers. Even though rounding to integers is typical, some programming languages convert from floating point to integers by truncation. Implementations in software that accept this conversion to integers by truncation introduce larger errors into the real-domain integer output from the inverse transform.
The term xe2x80x9chigh-precisionxe2x80x9d is used herein to refer to numerical values which are stored to a precision more accurate than the precision used when storing the values as integers. Examples of high-precision numbers are floating-point or fixed-point representations of numbers.
In performing a printing operation, there is a need for the printer to be able to merge a portion of an 8xc3x978 Discrete Cosine Transform (DCT) domain block with the complementary portion of a second DCT block quickly. The traditional approach involves conversion from the DCT domain for each of the original blocks to the respective real domains (each a 64-bit sample space) via an inverse DCT followed by merging the components of interest from each block in the real domain and finally transforming this new image back to the DCT domain. This method involves more computations than is necessary and lengthens total processing time.
While it is commonplace for graphics utilities to merge two independent images with brute force pixel-by-pixel merges as described above, it is also possible to approach the problem by working exclusively in the frequency domain. This approach potentially has at least two advantages over the traditional method in that it (1) provides for faster and more flexible image processing for the printing industry than is available with current technologies and (2) eliminates errors which routinely take place when working in the real domain with fixed precision computation by avoiding the real domain entirely.
Ut-Va Koc and K. J. Ray Liu in xe2x80x9cDCT-Based Motion Estimationxe2x80x9d, IEEE Transactions on Image Processing, Vol. 7, No. 7, July 1998, pp. 948-965, and Ut-Va Koc and K. J. Ray Liu in xe2x80x9cInterpolation-Free Subpixel Motion Estimation Techniques in DCT Domainxe2x80x9d, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 8., No. 4, August 1998, pp. 460-487, describe a method by which pixel motion can be approximated in subsequent image frames. The key mechanism in both articles is the construction of impulse functions derived from the inverse discrete cosine transform (IDCT) and inverse discrete sine transform (IDST) of pseudo phases which express a relationship between the discrete cosine transform (DCT) and discrete sine transform (DST) of temporarily shifted image samples. The key difference between the two articles is that the second article extends the first to enable detection of motion at the subpixel level. Neither article teaches including the de-quantization and/or re-quantization in the impulse solutions.
Shih-Fu Chang and David G. Messerschmitt in xe2x80x9cManipulation of Compositing of MC-DCT Compressed Videoxe2x80x9d, IEEE Journal on Selected Areas in Communications, Vol. 13, No. 1, January 1995, pp. 1-11, describe compression algorithms using discrete cosine transform (DCT) with or with out motion compensation (MC). Compression systems of this kind include JPEG (Joint Photographic Experts Group), motion JPEG, MPEG (Moving Picture Experts Group), and the H.261 standard. Chang and Messerschmitt derive a set of algorithms in which video signals are represented by quantized transform coefficients. Their paper uses the term xe2x80x9cquantized DCT coefficientsxe2x80x9d to mean the de-quantized coefficients since they explain that these quantized DCT coefficients xe2x80x9ccan be obtained after the inverse quantizer in the decoder . . . xe2x80x9d (p.2). Footnote 2 notes, xe2x80x9c . . . we assume the transform coefficients are by default quantized, so we can take advantage of the fact that many coefficients are truncated to zero after quantization.xe2x80x9d The de-quantization and/or re-quantization are not included in their transform domain equations and operations.
A comparison of the complexity of computation in Chang and Messerschmitt""s method and that of the present invention may be illustratively made by comparing the performance at the one-dimensional block level. Consider the following sub-block acquisition from two 1xc3x978 blocks G and H: 
Appealing to the approach suggested by Chang and Messerschmitt, we first write
F=M1G+M2H,
where             M      1        =                            [                                                    0                                                              I                  σ                                                                                    0                                            0                                              ]                ⁢                  xe2x80x83                ⁢        and        ⁢                  xe2x80x83                ⁢                  M          2                    =              [                                            0                                      0                                                                          I                                  g                  -                  σ                                                                    0                                      ]              ,
and apply the FDCT operator D to get                     DF        =                  xe2x80x83                ⁢                                            DM              1                        ⁢            G                    +                                    DM              2                        ⁢            H                                                  =                  xe2x80x83                ⁢                                            DM              1                        ⁢                          D              T                        ⁢            DG                    +                                    DM              2                        ⁢                          D              T                        ⁢            DH                                                            =                      xe2x80x83                    ⁢                                                    DCT                ⁡                                  (                                      M                    1                                    )                                            ·                              DCT                ⁡                                  (                  G                  )                                                      +                                          DCT                ⁡                                  (                                      M                    2                                    )                                            ·                              DCT                ⁡                                  (                  H                  )                                                                    ,            
where we use the fact that DDT=I8 and DADT is the two-dimensional DCT of the 8xc3x978 matrix A. At a xe2x80x9cworst casexe2x80x9d level, the one-dimensional computation in Chang and Messerschmitt requires 
whereas the one-dimensional method according to the invention disclosed in co-pending patent application Ser. No. 09/524,266 requires 95 multiplications and 103 additions. Using our present invention to extend the method to two-dimensional 8xc3x978 blocks, for shift along only one axis, the Chang and Messcherschmitt formula extended requires 1024 (8xc3x97128) multiplications and 960 (8xc3x97120) additions. Our worst case, a shift of 1, requires for the same eight rows in the 8xc3x978 block 760 (8xc3x9795) multiplications and 824 (8xc3x97103) additions. Our best case, a merge using half from each block, requires just 320 (8xc3x9740) multiplications and 384 (8xc3x9748) additions. Additional performance improvements can be gained by using our xe2x80x9cfast pathsxe2x80x9d.
Weidong Kou and Tore Fjxc3xa4llbrant in xe2x80x9cA Direct Computation of DCT Coefficients for a Signal Block Taken from Two Adjacent Blocksxe2x80x9d, IEEE Transactions on Signal Processing, Vol. 39, No. 7, July 1991, pp. 1692-1695, and Weidong Kou and Tore Fjxc3xa4llbrant in xe2x80x9cFast Computation of Transform Coefficients for a Subadjacent Block for a Transform Familyxe2x80x9d, IEEE Transactions on Signal Processing, Vol. 39, No. 7, July 1991, pp. 1695-1699, present in the first article a method for direct computation of DCT coefficients of a one-dimensional signal block composed of halves of two adjacent signal blocks from the DCT coefficients of the two original blocks. The key mechanism in this approach is the use of matrix factorization/matrix algebra. The result is a method which (for a 1xc3x978 signal block) requires 60 multiplications and 68 additions, whereas the worst case of the method according to the present invention applied to a shift over 4 requires 40 multiplications and 60 additions.
The method of the present invention gives the best known results for the shift over 4 pixels. It also gives the best known results for merges with 4 pixels from each block. Moreover, it excels in that it is just as easy to derive the algorithm for arbitrary signal blocks formed from a samples of one block and 8xe2x88x92"sgr" samples from the adjacent block as it is for xe2x80x9cxc2xd of one and xc2xd of the otherxe2x80x9d. Compare this with the algorithm described in the words of the authors (paraphrased) yields complex derivations when attempting to extend to any case other than xe2x80x9cxc2xd and xc2xdxe2x80x9d. Neither article teaches including the de-quantization in their equations nor the method to obtain the solutions taught in this invention.
It is therefore an object of the present invention to provide transform domain processing to shift and/or merge transformed data which increases the speed of processing of color images by color printers.
According to the invention, a two-dimensional algorithm performs the merging of complementary portions from two independent overlapped images on the same 8xc3x978 grid without the computational expense of conversion to and from the real domain. The merging parameters on the horizontal and vertical axes are independent. Due to the fact that non-zero DCT coefficients are generally sparse, this algorithm lends itself nicely to the development of special cases which are even faster.
The algorithm according to the present invention meets these criterion as well as:
1) Providing for faster and more flexible image processing for the printing industry that is available with current technologies. As an example, consider that JPEG images are often padded on the right and bottom when the image of interest has pixel dimensions which are not multiples of eight. If this image is rotated by ninety degrees, the padded areas suddenly take on new precedence as the top or left side of the image. By quickly performing one-dimensional shifts of the image border in each of the two dimensions via the method of this invention, the boundaries of the image are redefined and quality is restored.
2) Eliminating errors which routinely take place when working in the real domain with fixed precision computation by avoiding the real domain entirely. See, U.S. patent applications Ser. Nos. 09/186,245, 09/186,249, and 09/186,247, cited above.