It will be appreciated that photographic images or those which are recorded on electronic media are plagued with noise and other degradations including blur. For many different applications, one wants to remove the noise and enhance the image so as to bring out features of the image while at the same time suppressing noise and other artifacts that in general degrade the image. With respect to television or a photograph, these images are usually degraded in some way meaning that they are blurry due to objects which are not in focus or which are moving or due to the inherent resolution limits of the television signal itself; and are noisy in the fact that either the film grain is a pronounced problem or electronic noise degrades the image. The problem is particularly severe in format conversion from low resolution standard TV format NTSC up to high resolution HDTV television.
The problem is that given the image data which is received, one wishes to obtain the best possible picture which, in some instances means that the picture is to be sharpened, that noise is to be eliminated, or that the images are to be actually synthesized in another form that result in a higher resolution than the image from which the enhanced image is formed.
In the past classical methods for image enhancement include the so-called Wiener filtering technique which is an entirely spectral method that involves taking a Fourier transform of an image and generating a linear filter function to modify the Fourier transform coefficients by either increasing them or decreasing them at every frequency. This is a so-called "global" approach to enhancement.
This classical method does not provide sufficiently enhanced results in most cases because it assumes that image statistics are stationary, meaning that every patch of the image is generated by the same random process as is every other patch of the image. However, image structure does not derive from a stationary process. That is, if one were to look at random noise on a screen, one could conclude that this is generated by a stationary process. However, if one were to look at another region of the image where an edge might exist, and another region in which there might be a blank or absence of signal, and in another region where there might be a line, the underlying process can be seen to change from patch-to-patch or even from point-to-point.
More specifically with Wiener filtering, a degraded image is modeled as an uncorrupted image to which noise has been added. One then needs to estimate the power spectrum of the uncorrupted image and of the noise. The Wiener filter then specifies a modulation for each spatial frequency of the image. If the statistics of the image and the noise are stationary, that is constant over the image, then the processed image has the smallest possible square difference for the uncorrupted image of all linear filters. Unfortunately, image statistics are typically non-stationary and conventional Wiener filtering does not work well.
In contradistinction to the global Wiener approach, there are local approaches that include median weighting. For example, a 3.times.3 neighborhood in an image would require replacing the center pixel in the 3.times.3 neighborhood with, for instance, the median of the nine pixels. However, such processes also do not work very well.
The reason that such localized approaches do not work very well is that they introduce unpleasant artifacts. One of the reasons for the unpleasant artifacts is that such a system does not take advantage of the local orientation structure of the image. Because image information tends to be locally oriented, meaning that it is made up of lines and edges, the above method does not take into account orientation, much less the multi-scale nature of image information, meaning that true image information can be better distinguished from noise by analyzing the results of spatial filtering in various spatial frequency bands.
Thus, images can be thought of as having information at many scales and/or frequency bands, so that fine details are in the higher frequency band, whereas the coarse details are in the lower frequency band.
While not a direct answer to the problem of including local orientation in the enhancement of an image, previous so-called coring techniques are utilized to remove noise that exists on either side of an edge transition which is a sharp transition from dark to light in the image. One possible way to smooth out this noise is to simply replace each data point with an average percentage of three adjacent pixels so that there is a blurring operation to smooth out the noise. However, by smoothing out the noise, one also smooths out the edge that is desired.
A so-called coring operation such as illustrated in U.S. Pat. No. 4,523,230 involves the subdivision of the original image into a high frequency band and a low frequency band. With respect to the signals in the low frequency band, these are left unprocessed. However, with respect to the signals in the high frequency band, the band generally associated with lines and edges, one sets these signals to zero for any signals which are below a plus and minus threshold. With respect to signals outside the plus and minus threshold, the so-called core, these signals are given full weight, so that the only components which are retained or kept are those components outside the core-like threshold.
Coring of this nature results in an enhanced image in that when the output of the coring means are summed with the lowpass components, the entire image is produced with edges enhanced, and more importantly with the noise to either side of a transition from a dark portion of the image to a light portion of the image being attenuated. This sharpens the edge. Other techniques are utilized to shape the thresholding condition so as to remove artifacts which exist when signals are approximately at a threshold. These artifacts can show up in television signals by apparent flickering along an edge or line.
One problem associated with traditional coring is that it deals only with one dimension. It does not take advantage of the two-dimensional structure of the image. Another problem is that it does not deal with the multi-scale aspect of image information. Splitting the image into a number of bands, more than just a highpass and a lowpass band, is referred to as a multi-scale decomposition in which the image is split into a number of adjacent spatial frequency bands, with processing or modification taking place on the information within these multiple bands or scales.
The bands that are referred to herein are logarithmically-spaced, multi-frequency bands. So-called "pyramid-type" processing allows one to minimize processing time and hardware while at the same time breaking up an image into different scales or frequency bands. In general, it will be appreciated that by subdividing down the original image, the smaller the image size the lower the spatial frequency. Thus, for the highest spatial frequency band, one utilizes the smallest scale representation of the image, with the most samples, whereas for the lowest passband, one utilizes the largest scale, with the least samples.
In U.S. Pat. No. 4,523,230, Carlson et al. describe a technique for coring within a Laplacian pyramid. There are a number of problems with the above-mentioned pyramid scheme. One problem is that the Laplacian pyramid technique has a disadvantage that the analysis filters used for building the pyramid are bandpass filters; but the synthesis filters that are used for reconstruction of the image are lowpass filters. This asymmetry in the mathematics results in broadband changes to the image which are unwanted. For instance, in the context of a transform, the result of convolution of an image with a filter is represented as an x, y array of coefficients. For enhancement, one modifies the value of each coefficient. Then, one reconstructs the image after this coefficient modification through the use of synthesis filters. When different types of filters are used for analysis and synthesis in a Laplacian environment as above, if one changes the coefficient in one of the images of this pyramid, even though this is supposed to represent a certain frequency band, when the image is reconstructed the energy spreads out as error energy which spreads across many bands.
The second problem with such a technique is that there isn't any dependence of the correction factors utilized in the enhancement on any orientational structure.
Assuming that one is looking at an edge, and assuming one wishes to break it out not only into bands corresponding to frequency but also as to orientation, one would like to have filters that would pull out for instance a vertical high frequency band corresponding to vertical detail and a band which would be a medium-frequency diagonal band. However, these types of functions are not obtainable with the conventional Laplacian and Gaussian pyramids. The problem therefore with the prior art pyramids, being Laplacian or Gaussian is that circularly symmetrical bands are all that are achievable with highpass and lowpass filtering. Thus, they have no directional component.
A third problem and one that is a problem with all coring schemes is the problem of how one is to modify the value of a coefficient. The way coring schemes work is that one breaks the signal up into a number of images. These images each contain a set of coefficients. One then modifies the value of each coefficient with a look-up table. This means that the only information one is using to modify that coefficient is the value of the coefficient itself. In short, the coring systems modify the value for the cored sample without any knowledge of any of the surrounding coefficient values.
Another type of image enhancement system is a system developed by Bayer and Powell which is also a coring system. Further it is a multi-band oriented coring system, although they do not use a pyramid-type structure. It is multi-scale and utilizes a unique method for decomposing an image using filter kernels that are in the form of doublets. Each of the kernels is represented by a "1" at one position and a "-1" at another position.
While the Bayer and Powell system results in increased enhancement because it utilizes orientation, one of the major problems with the system is that the filters are not very sharply oriented. They are primitive filters from the standpoint of extracting information about edges and lines.
However, the most constant problem with Bayer and Powell as well as other coring systems is that the system utilizes only the value of the coefficient itself in the derivation of the modification of the particular coefficient.
In summary, Bayer and Powell have extended the aforementioned coring method to include multiple orientation and scales and by Carlson et al. to include pyramids. Powell and Bayer describe using gradient filters to break up the image into oriented components. When gradient operators for eight different orientations are added together, they form a sharpening filter. However, each of the gradient outputs is cored independently before being added together to form the sharpened output. This oriented filtering tends to remove noise while leaving desired oriented structures intact. Powell and Bayer describe performing this operation in a multiplicity of spatial scales by blurring the image and applying the cored-sharpening operation on expanded filters.
Carlson et al. employ related processing, using a pyramid data structure. Carlson et al. propose converting the image to a Laplacian pyramid, coring the individual Laplacian pyramid coefficients and reconstructing an enhanced image from the altered coefficients. Like the Powell and Bayer approach, this removes noise over a variety of spatial scales. It suffers, however, from two major drawbacks. First the image representation is not tuned for orientation and so it does not exploit the characteristic of image structure. Because the Carlson et al. system is not self-inverting, errors of one subband introduced by coring appear as errors in a different subband. This can cause artifacts in the reconstructed image.
In a further attempt to increase enhancement, in a work by H. Knutsson, R. Wilson and G. H. Granlund, IEEE Transactions or Communications, 31(3);388-397, 1983, a spatially adaptive filtering system is presented in which the investigators break the image into three different spatial filter bands. One is the original image unchanged. Another is an isotropically-blurred version of the original image, and the third is an image which is blurred along whatever the local orientation is at each particular point in the image. One can therefore assert that this system is an adaptively filtered image, with the output image being a linear combination of the three images. In this system, one decides what linear combination of the three to take based on the local image structure. If the image is oriented in a particular direction, Knutsson et al. take the spatially-adaptively filtered image if the image is isotropically noisy. Isotropically, if one is at a corner, Knutsson et al. take the original image; and if one is at a flat region, they take the isotropically blurred version.
The problem with this system based on published pictures is that is not multi-scale so that it looks at things only in a particular spatial scale. Secondly, it looks too "painterly" in that it looks as if the image was gone over with a paint brush. So whatever was the original orientation, it filters along that direction so that all the oriented structures are emphasized as with a paintbrush. The problem therefore is that its fidelity with the original image is suspect. It also introduces some interesting artifacts in that when the enhancement level is set too high, the entire image looks unnatural. In summary, Knutsson et al. have augmented adaptive Wiener filter approaches by including orientation analysis. Their proposed image is a linear combination of three images: the original image, a version which is blurred along the direction of the local image orientation, and an isotropically-blurred version. By including the image which is blurred along the local orientation, they are able to remove noise along edges better than previous methods.
With respect to further background, articles relating to image enhancement are as follows:
J. F. Abramatic and L. M. Silverman. Non-linear restoration of noisy images. IEEE Pat, Anal. Mach, Intell., 4(2):141, 1982; E. H. Adelson, E. Simoncelli, and R. Hingorani. Orthogonal pyramid transforms for image coding. In Proc. SPIE--Vis. Comm. and Image Proc. II, pages 50-58, Cambridge, Mass., 1987; B. E. Bayer, Image processing method using a collapsed Walsh-Hadamard transform. U.S. Pat. No. 4,549,212, October 1985; B. E. Bayer and P. G. Powell. A method for the digital enhancement of unsharp, grainy photographic images. In T. S. Huang, editor, Advances in Computer Vision Image Processing, volume 2, chapter 2. JAI Press Inc., Greenwich, Conn., 1986; P. J. Burt and E. H. Adelson. The Laplacian pyramid as a compact image code. IEEE Trans. Comm., 31(4):532-540, 1983; C. R. Carlson, E. H. Adelson, and C. H. Anderson. System for coring an image-representing signal. U.S. Pat. No. 4,523,230, June 1985; W. T. Freeman and E. H. Adelson. Steerable filters for early vision, image analysis, and wavelet decomposition. In Proc. 3rd Intl. Conf. Computer Vision, Osaka, Japan, 1990. IEEE; W. T. Freeman and E. H. Adelson. The design and use of steerable filters for image analysis, enhancement, and multi-scale representation. IEEE Pat. Anal. Mach. Intell., August 1991; R. C. Gonzalez and P. Wintz. Digital Image Processing. Addison-Wesley, 1977; M. Kass and A. P. Witkin. Analyzing oriented patterns. In Proc. Ninth IJCAI, pages 944-952, Los Angeles, Calif., August 1985; H. Knutsson and G. H. Granlund. Texture analysis using two-dimensional quadrature filters. In IEEE Computer Society Workshop on Computer Architecture for Pattern Analysis and Image Database Management, pages 206-213, 1983; H. Knutsson, R. Wilson, and G. H. Granlund. Anisotropic non-stationary image estimation and its applications: Part 1--Restoration of noisy images. IEEE Trans. Comm., 31(3):388-397, 1983; J. S. Lee. Digital image enhancement and noise filtering by use of local statistics. IEEE Pat. Anal. Mach. Intell., 2(2):165, 1980; R. P. Lippmann. An introduction to computing with neural nets. IEEE ASSP Magazine, pages 4-22, April 1987; S. G. Mallat. A theory for multi-resolution signal decomposition: the wavelet representation. IEEE PAT. ANAL. MACH. INTELL., 11(47):674-693, 1989; R. H. McMann and A. A. Goldberg. Improved signal processing techniques for color television broadcasting. J. SMPTE, 77:221-228, 1969; T. Poggio and F. Girosi. A theory of networks for approximation and learning. Artificial Intelligence Lab. Memo 1140, Massachusetts Institute of Technology, Cambridge, Mass. 02139, 1989; P. G. Powell. Image gradient detectors operating in a partitioned lowpass channel. U.S. Pat. No. 4,446,484, May 1984; P. G. Powell and B. E. Bayer. A method for the digital enhancement of unsharp, grainy photographic images. In IEE International Conference on Electronic Image Processing, pages 178-183, 1982. no. 214; E. P. Simoncelli and E. H. Adelson. Subband transforms. In J. W. Woods, editor, Subband Image Coding, chapter 4. Kluwer Academic Publishers, Norwell, Mass. 1990; E. P. Simoncelli W. T. Freeman, E. H. Adelson, and D. J. Heeger. Wavelet image transforms with continuous parameterization. Vision and Modeling Technical Report 161, The Media Lab, MIT, 20 Ames St., Cambridge, Mass. 02139, 1991; H. J. Trussell. A fast algorithm for noise smoothing based on a subjective criterion. IEEE Trans. Systems, Man, Cybern., 7(9):678, 1977; J. W. Woods and S. D. O'Neil. Subband coding of images. IEEE Trans. Acoust., Speech, Signal Proc., 34(5):1278-1288, 1986. A further patent relating to truncated subband coding of images is U.S. Pat. No. 4,817,182.
What will be appreciated with the above-mentioned prior systems as that none of the systems provide for modification of coefficients derived through a convolution of the image with a filter in which the modifier is both a function of the coefficient value, as well as related coefficient values.
Related coefficients, as referred to herein, refers to values of neighboring coefficients, those coefficients at positions other than the position of the coefficient to be modified; the value of the coefficient for different scales; or the value of the coefficient for different orientations. What this means is that in the prior art, no local structure of a neighborhood of coefficients was taken into account when the coefficient value was modified for enhancement purposes.
Also none of the prior art teaches the use of self-inverting analysis and synthesis filters in a multi-scale environment. It will be noted that a self-inverting transform is one which utilizes analysis filters followed by synthesis filters in which the synthesis filters are identical to the analysis filters.
As a subsidiary issue, it is important that, the transform and its filters be both steerable and multi-scale, not shown in the prior art. It is also important, at least for signal processing efficiencies, that a pyramid arrangement be utilized which involves the utilization of subsampling. Further, it is also important that these filters be substantially non-aliasing.
It is also desirable in both the analyzing and reconstruction or synthesis filters, that these filters have a relatively small spatial extent. This is to eliminate ringing and to accomplish this, it is necessary that the variations of the filters in frequency be relatively smooth.