Without compression, the transmission of images and video requires an unacceptable bandwidth in many applications. As a result, methods of compressing images and video have been the subject of numerous research publications. Image and video compression schemes convert an image consisting of an array of pixels into a sequence of bits. Compression involves transforming the image to a form that can be represented in fewer bits without losing the essential features of the original image. The transformed image is then transmitted over a communication link and the inverse transformation is applied at the receiver to recover the image or a reasonable facsimile thereof.
Wavelet transform has become a powerful tool for signal processing and image compression. A number of image and video compression systems based on wavelet transform and zero-tree coding schemes have been developed.
In existing image and video compression systems and standards, including the image coding standard JPEG2000 described for example in D. S. Taubman and M. W. Marcellin, “JPEG2000: Image Compression Fundamentals, Standards and Practice”, Kluwer Academic Publishers, Boston, 2002, page 423 to 430, two-dimensional (2-D) wavelet transform of an image is always carried out by one-dimensional (1-D) wavelet filtering along horizontal and vertical directions, if the filters are separable. This conventional wavelet transform, shown in FIG. 1, was described by J. W. Woods et al., in an article “Subband coding of images,” IEEE Trans. Acoustics, Speech, Signal Processing, Vol. ASSP-34, No. 5, pp. 1278-1288, 1986, and by S. G. Mallat in an article “A theory for multiresolution signal decomposition: The wavelet representation,” IEEE Trans. Pattern Anal. Machine Intell., Vol. 11, pp. 674-693, July 1989, which are incorporated herein by reference.
FIG. 1 schematically shows main steps of this prior-art multi-resolution wavelet transform method at a single resolution level, the steps forming a pyramid or a tree-like structure. First, an input image 100 is transformed in horizontal direction by applying a low-pass wavelet filter in step 101 and a high-pass wavelet filter in a parallel step 102 along rows of the image, producing two images filtered in horizontal directions. Next, in steps 111 and 112 the two horizontally filtered images are sub-sampled by discarding, respectively, odd and even columns therefrom, resulting in low-pass (L) and high-pass (H) wavelet coefficients images. In a next set of parallel steps 122-125, the low-pass and high-pass filters are applied in the vertical direction along the columns of each of the L and H wavelet coefficients images producing four filtered images. These four filtered images are down sampled in steps 132-135 in vertical direction by discarding e.g. odd rows in vertically high-pass filtered images, and even rows in vertically low-pass filtered images, producing finally four sub-band images respectively composed of LL, LH, HL and HH coefficients. The process repeats for the LL sub-band image as an input image with wavelet filters at a next resolution level. The method is thus recursive and produces a hierarchical sequence of down-sampled low-pass and high-pass filtered images at successively decreasing resolution levels.
A number of methods have been proposed to code the wavelet coefficients resulting from the multi-level wavelet transform. The most important methods include an embedded zero-tree structure (EZW) method disclosed by Shapiro in U.S. Pat. No. 5,315,670, and a set partitioning in hierarchical trees (SPIHT) disclosed by Pearlman et al. in U.S. Pat. No. 5,764,807. These zero-tree structures employ a parent-child relationship between a coefficient at one level of the wavelet transform and four coefficients within a 2×2 square at the next lower level.
Prior-art wavelet filters used in the wavelet transform are typically finite impulse response (FIR) filters that are implemented with a non-recursive convolution structure, or with a lifting structure. FIG. 2 shows a prior-art convolution structure of a symmetric FIR filter with a length of 5. When a FIR filter is applied to a row, a column of an image, or along a motion trajectory within a video sequence, the sequence of pixels, i.e. image samples, along the row, the column, or the motion trajectory has to be extended in order to result in an output sequence of the same length. FIG. 3 illustrates a popularly used prior-art method of symmetric extension, which is often considered to be the best approach in the literature to date. If the extension is done by repeating the first and last samples of the sequence, the sequence cannot be perfectly reconstructed from the resulting wavelet coefficients using conventional convolutional FIR filters, as shown for example in an article by H. J. Bamard et al., “Efficient signal extension for subband/wavelet decomposition of arbitrary length signals,” SPIE Vol. 2096, Visual Communications and Image Processing, 1993, pp. 966-975.
These prior art methods of wavelet transform and systems of image and video compression have the following shortcomings. When an image is filtered in horizontal and vertical directions, the filter often crosses edges in the image, i.e. elongated geometrical structures in the image across which an image value drastically changes. A sequence of pixels across an edge usually contains a broad frequency spectrum, from low to high frequencies. The wavelet transform decomposes the energy of the pixel sequence to a large number of frequency bands, also referred to as scales. This means that many wavelet coefficients at many resolution levels are required to properly reconstruct the edge. Therefore, the conventional wavelet transform, which is not adapted to an image, does not provide a compact representation of edges. As a result, the prior art wavelet-based image and video compression systems produce “ringing” artifacts around edges, especially at low bit rates.
The main limitation of wavelet filtering schemes currently used for signal representation is that they do not take advantage of the geometrical regularity of many signal structures. Indeed, these wavelet filters are composed of vectors having a support which is not adapted to the elongation of the signal structures such as regular edges. Curvelet bases have recently been introduced in E. Candes and D. Donoho, “Curvelets: A surprisingly effective nonadaptive representation of objects with edges,” tech. rep., Stanford Univ., 1999, the contents of which are incorporated in reference herein, to take partial advantage of the geometrical regularity of the signal, by using elongated support zones along different directions. Yet, this strategy has not been able to improve results currently obtained with a wavelet basis on natural images, because it does not incorporate explicitly the geometrical information.
To incorporate this geometrical regularity, edge oriented representations have been developed in image processing. An edge detector computes an edge map with discretized differential operators and computes some coefficients in order to reconstruct an approximation of the image grey level between edges. In S. Carlsson, “Sketch based coding of gray level images,” Signal Processing, Vol. 15, pp. 57-83, July 1988, the contents of which are incorporated by reference herein, an edge detector computes an edge map with discretized derivative operators. For compression applications, chain algorithms are used to represent the chains of edge points with as few bits as possible. The left and right pixel values along the edges are kept and an image is reconstructed from these left and right values with a diffusion process. If all edges were step edges with no noise, this representation would be appropriate but it is rarely the case, and as a result the reconstructed image is not sufficiently close to the original image. An error image is computed and coded with a Laplacian pyramid, but this requires too many bits to be competitive with a procedure such as JPEG-2000.
A different strategy is used by several other methods, which encode coefficients that represent the image variations in regions between edges as opposed to the image variations across edges. In I. Masaki, U. Desai, A. Chandrakasan, and B. Horn, “Method and apparatus for compressing and decompressing a video image” U.S. Pat. No. 5,790,269, instead of keeping the image grey levels at the left and right of an edge point, the parameters of a linear regression are kept to approximate the image grey levels along horizontal and vertical lines between two edge points. A similar strategy is used in T. Echigo, J. Maeda, J.-K. Hong, and M. Ioka, “System for and method of processing digital images,” U.S. Pat. No. 5,915,046, where each region is coded using a polygonal surface approximation. In the two above referenced methods, the coefficients are more global and thus less sensitive to noise but edges are still represented by a discontinuity between two regions.
In A. Mertins, “Image compression via edge-based wavelet transform,” Opt. Eng., Vol.38, No. 6, pp. 991-1000, 1999, the grey level image values are decomposed in a one-dimensional discrete wavelet basis along horizontal or vertical lines between two edge points. In L. Bouchard and R. Askenatis, “Region-based texture coding and decoding method and corresponding system.” U.S. Pat. No. 5,898,798, the image is segmented into regions, which are coded independently using a quincunx wavelet transform. In the two above referenced wavelet methods, the whole image information is represented but these procedures do not use the geometrical image regularity to decorrelate the coefficients produced by the image variations on each side of the edges.
U.S. Pat. No. 6,836,569 issued to Le Pennec and Mallat, discloses a processing method and system for n-dimensional signals such as images, wherein foveal filtering and bandelet transforms are used to transform an image taking into account geometrical features therein such as edges. First, foveal processing of the image data is performed to compute foveal coefficients along a set of curved trajectories in the image by using foveal filters with a support across trajectories and along coordinates in the image. A second transform is then performed using bandelet filters, or two-dimensional anisotropic wavelets that are warped along a geometric flow in the image. This method takes advantage of image content such as regular elongated geometrical structures therein, or edges, but required rather complicated processing using two-dimensional bandelet filters with additional foveal pre-processing.
Further, the prior-art wavelet transform methods of image processing using non-recursive FIR filters require a large memory. Memory size may be critical for temporal wavelet transform and for certain applications such as digital cameras, see e.g. U.S. Pat. No. 6,343,155. The symmetric extension, although it is often considered heretofore in the literature to be the best extension method, introduces additional distortion into decoded images. As described by C. Christopoulos, A. Skodras, and T. Ebrahimi, in an article “The JPEG2000 still image coding system: An overview,” IEEE Trans. Consumer Electronics, Vol. 46, No. 4, pp. 1103-1127, November 2000, when an image is divided into tiles and each tile is compressed independently using JPEG2000, this additional distortion can be observed as block artifacts around the boundary of every tile in the decoded image.
Accordingly, it is an object of the present invention to provide a sufficiently simple method of image transformation based on curved wavelet transform that is adaptive to edges and other regular elongated geometrical structures in the image and yields a compact and accurate representation thereof.
It is another object of the invention to provide an image and video data compression system based on multi-level curved wavelet transform that provides a high compression capability.
It is another object of the present invention to provide a method of recursive wavelet filtering and an image compression system employing thereof that requires a small memory and that is easy to implement in hardware.