The current invention relates to the processing of images such as photographs, drawings, and other two dimensional displays. It further relates to the processing of such images which are captured in digital format or after they have been converted to or expressed in digital format. This invention further relates to use of novel coding methods to increase the speed and compression ratio for digital image storage and transmission while avoiding introduction of undesirable artifacts into the reconstructed images.
In general, image processing is the analysis and manipulation of two-dimensional representations, which can comprise photographs, drawings, paintings, blueprints, x-rays of medical patients, or indeed abstract art or artistic patterns. These images are all two-dimensional arrays of information. Until fairly recently, images have comprised almost exclusively analog displays of analog information, for example, conventional photographs and motion pictures. Even the signals encoding television pictures, notwithstanding that the vertical scan comprises a finite number of lines, are fundamentally analog in nature.
Beginning in the early 1960""s, images began to be captured or converted and stored as two-dimensional digital data, and digital image processing followed. At first, images were recorded or transmitted in analog form and then converted to digital representation for manipulation on a computer. Currently digital capture and transmission are on their way to dominance, in part because of the advent of charge coupled device (CCD) image recording arrays and in part because of the availability of inexpensive high speed computers to store and manipulate images.
An important task of image processing is the correction or enhancement of a particular image. For example, digital enhancement of images of celestial objects taken by space probes has provided substantial scientific information. However, the current invention relates primarily to compression for transmission or storage of digital images and not to enhancement.
One of the problems with digital images is that a complete single image frame can require up to several megabytes of storage space or transmission bandwidth. That is, one of today""s 3xc2xd inch floppy discs can hold at best a little more than one gray-scale frame and sometimes substantially less than one whole frame. A full-page color picture, for example, uncompressed, can occupy 30 megabytes of storage space. Storing or transmitting the vast amounts of data which would be required for real-time uncompressed high resolution digital video is technologically daunting and virtually impossible for many important communication channels, such as the telephone line. The transmission of digital images from space probes can take many hours or even days if insufficiently compressed images are involved. Accordingly, there has been a decades long effort to develop methods of extracting from images the information essential to an aesthetically pleasing or scientifically useful picture without degrading the image quality too much and especially without introducing unsightly or confusing artifacts into the image.
The basic approach has usually involved some form of coding of picture intensities coupled with quantization. One approach is block coding; another approach, mathematically equivalent with proper phasing, is multiphase filter banks. Frequency based multi-band transforms have long found application in image coding. For instance, the JPEG image compression standard, W. B. Pennebaker and J. L. Mitchell, xe2x80x9cJPEG: Still Image Compression Standard,xe2x80x9d Van Nostrand Reinhold, 1993, employs the 8xc3x978 discrete cosine transform (DCT) at its transformation stage. At high bit rates, JPEG offers almost lossless reconstructed image quality. However, when more compression is needed, annoying blocking artifacts appear since the DCT bases are short and do not overlap, creating discontinuities at block boundaries.
The wavelet transform, on the other hand, with long, varying-length, and overlapping bases, has elegantly solved the blocking problem. However, the transform""s computational complexity can be significantly higher than that of the DCT. This complexity gap is partly in terms of the number of arithmetical operations involved, but more importantly, in terms of the memory buffer space required. In particular, some implementations of the wavelet transform require many more operations per output coefficient as well as a large buffer.
An interesting alternative to wavelets is the lapped transform, e.g., H. S. Malvar, Signal Processing with Lapped Transforms, Artech House, 1992, where pixels from adjacent blocks are utilized in the calculation of transform coefficients for the working block. The lapped transforms outperform the DCT on two counts: (i) from the analysis viewpoint, they take into account inter-block correlation and hence provide better energy compaction; (ii) from the synthesis viewpoint, their overlapping basis functions decay asymptotically to zero at the ends, reducing blocking discontinuities dramatically.
Nevertheless, lapped transforms have not yet been able to supplant the unadorned DCT in international standard coding routines. The principal reason is that the modest improvement in coding performance available up to now has not been sufficient to justify the significant increase in computational complexity. In the prior art, therefore, lapped transforms remained too computationally complex for the benefits they provided. In particular, the previous lapped transformed somewhat reduced but did not eliminate the annoying blocking artifacts.
It is therefore an object of the current invention to provide a new transform which is simple and fast enough to replace the bare DCT in international standards, in particular in JPEG and MPEG-like coding standards. It is another object of this invention to provide an image transform which has overlapping basis functions so as to avoid blocking artifacts. It is a further object of this invention to provide a lapped transform which is approximately as fast as, but more efficient for compression than, the bare DCT. It is yet another object of this invention to provide dramatically improved speed and efficiency using a lapped transform with lifting steps in a butterfly structure with dyadic-rational coefficients. It is yet a further object of this invention to provide a transform structure such that for a negligible complexity surplus over the bare DCT a dramatic coding performance gain can be obtained both from a subjective and objective point of view while blocking artifacts are completely eliminated.
In the current invention, we use a family of lapped biorthogonal transforms implementing a small number of dyadic-rational lifting steps. The resulting transform, called the LiftLT, not only has high computation speed but is well-suited to implementation via VLSI.
Moreover, it also consistently outperforms state-of-the-art wavelet based coding systems in coding performance when the same quantizer and entropy coder are used. The LiftLT is a lapped biorthogonal transform using lifting steps in a modular lattice structure, the result of which is a fast, efficient, and robust encoding system. With only 1 more multiplication (which can also be implemented with shift-and-add operations), 22 more additions, and 4 more delay elements compared to the bare DCT, the LiftLT offers a fast, low-cost approach capable of straightforward VLSI implementation while providing reconstructed images which are high in quality, both objectively and subjectively. Despite its simplicity, the LiftLT provides a significant improvement in reconstructed image quality over the traditional DCT in that blocking is completely eliminated while at medium and high compression ratios ringing artifacts are reasonably contained. The performance of the LiftLT surpasses even that of the well-known 9/7-tap biorthogonal wavelet transform with irrational coefficients. The LiftLT""s block-based structure also provides several other advantages: supporting parallel processing mode, facilitating region-of-interest coding and decoding, and processing large images under severe memory constraints.
Most generally, the current invention is an apparatus for block coding of windows of digitally represented images comprising a chain of lattices of lapped transforms with dyadic rational lifting steps. More particularly, this invention is a system of electronic devices which codes, stores or transmits, and decodes Mxc3x97M sized blocks of digitally represented images, where M is an even number. The main block transform structure comprises a transform having M channels numbered 0 through Mxe2x88x921, half of said channel numbers being odd and half being even; a normalizer with a dyadic rational normalization factor in each of said M channels; two lifting steps with a first set of identical dyadic rational coefficients connecting each pair of adjacent numbered channels in a butterfly configuration, M/2 delay lines in the odd numbered channels; two inverse lifting steps with the first set of dyadic rational coefficients connecting each pair of adjacent numbered channels in a butterfly configuration; and two lifting steps with a second set of identical dyadic rational coefficients connecting each pair of adjacent odd numbered channels; means for transmission or storage of the transform output coefficients; and an inverse transform comprising M channels numbered 0 through Mxe2x88x921, half of said channel numbers being odd and half being even; two inverse lifting steps with dyadic rational coefficients connecting each pair of adjacent odd numbered channels; two lifting steps with dyadic rational coefficients connecting each pair of adjacent numbered channels in a butterfly configuration; M/2 delay lines in the even numbered channels; two inverse lifting steps with dyadic rational coefficients connecting each pair of adjacent numbered channels in a butterfly configuration; a denormalizer with a dyadic rational inverse normalization factor in each of said M channels; and a base inverse transform having M channels numbered 0 through Mxe2x88x921.