The invention relates generally to a multi-dimensional signal processing method and apparatus, and in particular to a method and apparatus useful for processing multi-dimensional signals, such as two-dimensional images.
The invention is particularly pertinent to the field of image data processing and compression. Image data compression is a process which encodes images for storage or transmission over a communication channel, with fewer bits than what is used by the uncoded image. The goal is to reduce the amount of degradation introduced by such an encoding, for a given data rate. The invention is also relevant for applications to the restoration of signals by removing noises or to matching and classification applications.
In signal processing, efficient procedures often require to compute a stable signal representation which provides precise signal approximations with few non-zero coefficients. Signal compression applications are then implemented with quantization and entropy coding procedures. At high compression rates, it has been shown in S. Mallat and F. Falzon, xe2x80x9cAnalysis of low bit rate image transform coding,xe2x80x9d IEEE Trans. Signal Processing, vol. 46, pp. 1027-1042, 1998, the contents of which are incorporated in reference herein, that the efficiency of a compression algorithm essentially depends upon the ability to construct a precise signal approximation from few non-zero coefficients in the representation. Noise removal algorithms are also efficiently implemented with linear or non-linear diagonal operators over such representations, including thresholding strategies. Other applications such as classification or signal matching can also take advantage of sparse signal representations to reduce the amount of computations in the classification or matching algorithms.
For signal processing, the stability requirement of the signal representation has motivated the use of bases and in particular orthogonal bases. The signal is then represented by its inner products with the different vectors of the orthogonal basis. A sparse representation is obtained by setting to zero the coefficients of smallest amplitude. The Fourier transform which represents signals by their decomposition coefficients in a Fourier basis have mostly dominated signal processing until the 1980""s. This basis is indeed particularly efficient to represent smooth signals or stationary signals. During the last twenty years, different signal representations have been constructed, with fast procedures which decompose the signal in a separable basis. Block transforms and in particular block cosine bases have found important applications in image processing. The JPEG still image coding standard is an application which quantizes and Huffman encodes the block cosine coefficients of an image. More recently, separable wavelet bases have been shown to provide a more sparse image representation, which therefore improves the applications. Wavelets compute local image variations at different scales. In particular the JPEG standard is now being replaced by the JPEG-2000 standard which quantizes and encodes the image coefficients in a separable wavelet basis: xe2x80x9cJPEG 2000, ISO/IEC 15444-1:2000,xe2x80x9d 2000, the contents of which are incorporated in reference herein. Non-linear noise removal applications have been developed by thresholding the wavelet coefficients of noisy signals in D. Donoho and I. Johnstone, xe2x80x9cIdeal spatial adaptation via wavelet shrinkage,xe2x80x9d Biometrika, vol. 81, pp. 425-455, December 1994, the contents of which are incorporated in reference herein.
To obtain a more sparse representation, foveal procedures gather high resolution data only in the neighborhood of selected points in the image, as described in E. Chang, S. Mallat, and C. Yap, xe2x80x9cWavelet foveation,xe2x80x9d Applied and Computational Harmonic Analysis, pp. 312-335, 2000, the contents of which are incorporated in reference herein. This information is equivalent to computing wavelet coefficients only in the neighborhood of specific points as shown in the above reference. This strategy is similar to the behavior of a retina, which provides a high resolution measurements where the fovea is centered and a resolution which decreases when the distance to the fovea center increases. Applications to image compressions have also been developed in the above reference by Chang et al.
The main limitation of bases such as wavelet or block cosine bases, currently used for signal representation, is that these bases do not take advantage of the geometrical regularity of many signal structures. Indeed, these bases are composed of vectors having a support which is not adapted to the elongation of the signal structures such as regular edges. Curvelet bases have recently been introduced in E. Candes and D. Donoho, xe2x80x9cCurvelets: A surprisingly effective nonadaptive representation of objects with edges,xe2x80x9d tech. rep., Stanford Univ., 1999, the contents of which are incorporated in reference herein, to take partially advantage of the geometrical regularity of the signal, by using elongated vectors along different directions. Yet, this strategy has not been able to improve results currently obtained with a wavelet basis on natural images, because it does not incorporate explicitely the geometrical information.
To incorporate this geometrical regularity, edge oriented representations have been developed in image processing. An edge detector computes an edge map with discretized differential operators and computes some coefficients in order to reconstruct an approximation of the image grey level between edges. In S. Carlsson, xe2x80x9cSketch based coding of gray level images,xe2x80x9d IEEE Transaction on Signal Processing, vol. 15, pp. 57-83, July 1988, the contents of which are incorporated in reference herein, an edge detector computes an edge map with discretized derivative operators. For compression applications, chain algorithms are used to represent the chains of edge points with as few bits as possible. The left and right pixel values along the edges are kept and an image is reconstructed from these left and right values with a diffusion process. If all edges were step edges with no noise, this representation would be appropriate but it is rarely the case, and as a result the reconstructed image is not sufficiently close to the original image. An error image is computed and coded with a Laplacian pyramid, but this requires too much bits to be competitive with a procedure such as JPEG-2000.
The above referenced method of Carlsson has been extended in C.-Y. Fu and L. I. Petrich, xe2x80x9cImage compression technique.xe2x80x9d U.S. Pat. No. 5,615,287, the contents of which are incorporated in reference herein, by keeping weighted average values along the left and right sides of the edges. Although the information is different, there is still little information to characterize the image transition when the edge is not a step edge. Another extension of the method of Carlsson has been proposed in D. Geiger, xe2x80x9cImage compression method and apparatus.xe2x80x9d U.S. Pat. No. 5,416,855, the contents of which are incorporated in reference herein. An iterative process defines a set of edge pixels and assigns a value to them. A reconstructed image is then obtained from these values with a diffusion process. This representation can contain more accurate information on the image than that of Carlsson but it then requires many pixels to reconstruct the different types of edges and is therefore not sparse enough.
In S. Mallat and S. Zhong, xe2x80x9cCharacterization of signals from multiscale edges,xe2x80x9d IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 14, no. 7, pp. 710-732, 1992, the contents of which are incorporated in reference herein, a wavelet edge based image representation is computed, which carries more information than the above referenced method of Carlsson. However, this representation requires a different edge map at each scale of the wavelet transform, which is a handicap to produce a sparse representation. In X. Xue, xe2x80x9cImage compression based on low-pass wavelet transform and multi-scale edge compensation,xe2x80x9d in Data Compression Conference, p. 559, 1999, the contents of which are incorporated in reference herein, a multiscale edge map is also computed.
A different strategy is used by several other methods which encode coefficients that represent the image variations in regions between edges as opposed to the image variations across edges. In I. Masaki, U. Desai, A. Chandrakasan, and B. Horn, xe2x80x9cMethod and apparatus for compressing and decompressing a video image.xe2x80x9d U.S. Pat. No. 5,790,269, the contents of which are incorporated in reference herein, instead of keeping the image grey levels at the left and right of an edge point, the parameters of a linear regression are kept to approximate the image grey levels along horizontal and vertical lines between two edge points. A similar strategy is used in T. Echigo, J. Maeda, J.-K. Hong, and M. Ioka, xe2x80x9cSystem for and method of processing digital images,xe2x80x9d U.S. Pat. No. 5,915,046, the contents of which are incorporated in reference herein, where each region is coded using a polygonal surface approximation. In the two above referenced methods, the coefficients are more global and thus less sensitive to noise than the above referenced method of Carlsson but edges are still represented by a discontinuity between two regions.
In A. Mertins, xe2x80x9cImage compression via edge-based wavelet transform,xe2x80x9d Opt. Eng., vol. 38, no. 6, pp. 991-1000, 1999, the contents of which are incorporated in reference herein, the grey level image values are decomposed in a one-dimensional discrete wavelet basis along horizontal or vertical lines between two edge points. In L. Bouchard and R. Askenatis, xe2x80x9cRegion-based texture coding and decoding method and corresponding system.xe2x80x9d U.S. Pat. No. 5,898,798, the contents of which are incorporated in reference herein, the image is segmented in regions which are coded independently using a quincunx wavelet transform. In the two above referenced wavelet methods, the whole image information is represented but these procedures do not use the geometrical image regularity to decorrelate the coefficients produced by the image variations on each side of the edges.
Accordingly, there exists a need in the art for improving multi-dimensional signal compression or processing, by computing representations which carry enough information to reproduce all types of edges in images, lead to sparse representations with decorrelation procedures that take advantage of the geometrical signal regularity along edges, and are numerically stable.
It is an object of this invention to devise a method and means to construct a sparse and stable foveal representation of multi-dimensional (n-dimensional) signals by taking advantage of the regularity of their geometrical structures. It is yet another object of this invention to build a system that compresses signals by quantizing and encoding the coefficients of this sparse foveal representation. Another object of this invention is to remove noise from signals by diagonal processing within this foveal representation. Another object of this invention is to match structures of two different signals by processing their computed foveal representation. Yet another object of this invention is to classify signals by processing their foveal representation.
The invention comprises a foveal trajectory processor that computes foveal coefficients which are inner products of the signal with one-dimensional foveal filters along trajectories, and which specify the signal variations in a sufficiently large neighborhood of the trajectories. The bandelet processor then yields a sparse representation by removing the correlation between foveal coefficients due the geometrical signal regularity. The resulting bandelet coefficients are decomposition coefficients in a basis composed of vectors elongated along trajectories. This representation has therefore the stability property of representations in bases, while having the same geometrical flexibility as edge representations. Similarly to standard foveal algorithms, this approach gathers high resolution image information at specific locations, however, as opposed to existing foveal algorithms, this information is not provided in the neighborhood of isolated points but in the neighborhood of curves or surfaces across the signal.
A trajectory is defined as a discretized surface of dimension nxe2x88x921 over the signal support. In two dimensions, it is therefore a one-dimensional curve in the image and in three dimensions it is a two-dimensional surface. These trajectories may or may not correspond to the edges of the signal. To structure properly the representation, each trajectory is defined with respect to a particular direction, and each point of this trajectory is specified by a coordinate along this direction. There are n possible orthogonal directions corresponding to the directions of the n-dimensional signal array. The list of trajectories is thus called an n-directional trajectory list.
To compute the foveal coefficients, a set of one-dimensional foveal filters is predefined. Each of these filters is centered along each trajectory, pointing in the direction associated to this trajectory, and the resulting one-dimensional inner products are computed in this direction. At two different positions, the inner products are thus computed with one-dimensional vectors that do not overlap and which are therefore orthogonal. When trajectories are located in the neighborhood of edges, foveal coefficients give a very different representation than what is obtained with existing methods. The foveal filters are chosen in order to characterize the signal variations in a large neighborhood on each side of a trajectory, not just at left and right points as in S. Carlsson, xe2x80x9cSketch based coding of gray level images,xe2x80x9d IEEE Transaction on Signal Processing, vol. 15, pp. 57-83, July 1988 and C.-Y. Fu and L. I. Petrich, xe2x80x9cImage compression technique.xe2x80x9d U.S. Pat. No. 5,615,287, the contents of which are incorporated in reference herein. Some of these foveal filters have positive and negative coefficients to compute image variations across edges as opposed to weighted averages of grey level values. By choosing foveal filters which are wavelets, a multiscale approximation of the signal transition across the trajectories is obtained.
Foveal wavelet coefficients characterize specifically the image variations across trajectories and not across regular image regions as in L. Bouchard and R. Askenatis, xe2x80x9cRegion-based texture coding and decoding method and corresponding system.xe2x80x9d U.S. Pat. No. 5,898,798, in T. Echigo, J. Maeda, J.-K. Hong, and M. Ioka, xe2x80x9cSystem for and method of processing digital images.xe2x80x9d U.S. Pat. No. 5,915,046, in I. Masaki, U. Desai, A. Chandrakasan, and B. Horn, xe2x80x9cMethod and apparatus for compressing and decompressing a video image.xe2x80x9d U.S. Pat. No. 5,790,269 or in A. Mertins, xe2x80x9cImage compression via edge-based wavelet transform,xe2x80x9d Opt. Eng., vol. 38, no. 6, pp. 991-1000, 1999, the contents of which are incorporated in reference herein. As opposed to the wavelet based representation in S. Mallat and S. Zhong, xe2x80x9cCharacterization of signals from multiscale edges,xe2x80x9d IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 14, no. 7, pp. 710-732, 1992 and in X. Xue, xe2x80x9cImage compression based on low-pass wavelet transform and multi-scale edge compensation,xe2x80x9d in Data Compression Conference, p. 559, 1999, the contents of which are incorporated in reference herein, a single trajectory/edge map is used instead of a family of scale dependent edge maps.
The present invention includes a trajectory finder which takes in input the n-dimensional signal and chooses the optimal location of trajectories along which to compute the foveal coefficients. The signal is filtered with one-dimensional convolutions along its lines in each of the n directions, with a set of one-dimensional trajectory filters. In each direction, the trajectory points are located with a one-dimensional energy peak detection along each line. Depending upon the particular choice of foveal filters, the trajectory points may or may not be located at edge points. A chaining procedure computes the resulting set of trajectories in each direction. A non-overlapping partition segments these trajectories to guarantee that different trajectories do not overlap. If the trajectory filters are discretized differential operators, then the trajectory finder is similar to an edge detector such as a Sobel edge detector described in A. Jain, Fundamentals Of Digital Image Processing. Englewood Cliffs: Prentice Hall, 1989 or a Canny edge detector introduced in J. Canny, xe2x80x9cA computational approach to edge detection,xe2x80x9d IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), vol. 8, pp. 679-698, 1986. However, for other trajectory filters, the resulting trajectories may not be edges as understood in the usual sense.
The foveal reconstruction processor recovers a signal approximation from the foveal coefficients. This processor computes a reconstructed signal by finding a signal having foveal coefficients close to the provided foveal coefficients, and which is regular away from the trajectories. It is calculated by minimizing a combination of a positive definite constraint functional, vanishing only when the foveal coefficients of the signal are equal to the provided foveal coefficients, and a regularization functional. A foveal residue is computed as a difference between the input signal and the reconstructed foveal signal. The sharp transitions of the input signal along the trajectories have disappeared in the foveal residue, which carries global smooth variations and texture variations in areas not covered by trajectories. The whole signal information is then represented by three components: the n-directional trajectory list, the foveal coefficients that characterize the signal profile along these trajectories, and the residue.
To produce a sparse representation, a bandelet processor decorrelates the foveal coefficients by applying linear invertible operators along each trajectory. Indeed, if a trajectory follows a regular geometric signal structure, then the foveal coefficients have smooth variations along this trajectory and the linear operators take advantage of these smooth variations to perform the decorrelation. For example, these linear operators can be chosen to be decompositions in a cosine basis or in a wavelet basis, defined over the support of the trajectory. The resulting bandelet coefficients are equal to inner products of the signal with n-dimensional bandelet vectors whose supports are elongated along a trajectory. If the linear transform is an orthogonal transform, and the foveal filters are also orthogonal, then the bandelet vectors are orthogonal as well. The resulting representation is therefore perfectly stable. Moreover, along geometrical signal structures most bandelet coefficients have a negligible amplitude and can be set to zero to produce a sparse representation. The inverse bandelet processor recovers the foveal coefficients by performing an inverse bandelet transform using linear operators that are substantially the inverse of those used in the bandelet processor.
A geometric processor also decorrelates the coordinates of each trajectory in an n-directional trajectory list by applying linear invertible operators to these coordinates. These linear operators may be transformation operators in a sine basis or in a wavelet basis. Most of the resulting coefficients then have a negligible amplitude and can be set to zero to obtain a sparse representation. The inverse geometric processor recovers the trajectories by applying linear operators that are substantially the inverse of the linear operators used by the geometric processor.
Signal processing procedures are efficiently implemented in a foveal representation because of its ability to provide representations that are sparse and still accurate when setting their smallest coefficients to zero. A signal compression procedure is implemented by quantizing the bandelet and geometric coefficients and by encoding them for transmission or storage applications. The foveal residue can be compressed with a state of the art transform code such as JPEG-2000 for n=2 dimensional signals. Signal restoration algorithms are implemented by applying linear or non-linear diagonal operators to the bandelet coefficients of this representation, and using a state of the art denoising procedure on the residue. Similarly, this foveal representation reduces computations for classification or matching algorithms by providing a sparse and geometrically oriented representation, on which to apply standard classification or matching techniques.