1. Origin of the Invention
The invention described herein was made in the performance of work under a NASA contract, and is subject to the provisions of Public Law 96-517 (35 USC .sctn. 202) in which the Contractor has elected to retain title.
2. Field of the Invention
The invention relates to signal processing techniques, and more specifically to techniques for the enhancement of edges in a temporal signal containing information in one or more dimensions. The edge enhancing techniques taught herein are advantageously applied to television, and other, related applications.
3. General Description of Related Art
A wide variety of techniques have been developed for processing and filtering signals, particularly signals representing two-dimensional images. In particular, many image processing techniques are provided for enhancing the clarity of a blurred image. An image may appear blurred for a number of reasons. An original sharp image may be blurred during transmission due to noise or other factors. In other circumstances, the original image is itself insufficiently clear and techniques are employed to sharpen the original image. Even in circumstances where an image is not actually blurred, the image may appear blurred due to human visual perception considerations.
If an image is blurred or degraded by a well-understood process, such as shot noise occurring during transmission, the image can usually be enhanced by developing a model of the source of degradation, then reconstructing the original image using the model. However, in many circumstances, a source of degradation of the image cannot be modeled and, hence, the image cannot be faithfully reconstructed.
In many circumstances, a blurred or perceptually blurry image may be enhanced, by enhancing high frequency spatial components of the image. For example, high frequency components are usually degraded more significantly during transmission than low frequency components. Hence, enhancement of high frequency components may be effective in compensating for high frequency components lost during transmission. Moreover, as will be described in more detail below, because of human visual perception considerations, an image having enhanced high frequency components simply appears sharper than an image without enhanced high frequency components.
Accordingly, various image processing techniques have been developed for modifying or supplementing the high spatial frequency components of an image, either for the purpose of providing a perceptually clearer image or for compensating for degradation in an image caused during transmission.
In the following, several such image processing techniques are summarized and general visual perceptual considerations are described. Although the following discussion is primarily directed to two dimensional time-varying or "temporal" signals, the techniques are, unless otherwise noted, generally applicable to n-dimensional information components defined by applied temporal signals that have been sampled and converted to sample streams with level-values which define the information. For optical image processing n equals two, and the information component is a visual image.
4. Perception Considerations
It has been found that the human visual system appears to compute a primitive spatial-frequency decomposition of luminous images, by partitioning spatial frequency information into a number of contiguous, overlapping spatial-frequency bands. Each band is roughly an octave wide and the center frequency of each band differs from its neighbors by roughly a factor of two. Research suggests that there are approximately seven bands of "channels" that span the 0.5 to 60 cycle/degree spatial-frequency range of the human visual system. The importance of these findings is that spatial frequency information more than a factor of two away from other spatial frequency information will be independently processed by the human visual system.
An important parameter of a signal processing scheme is the highest spatial frequency of interest f.sub.0. In general, the selection of f.sub.0 is based on the desired application. When the temporal signal has two dimensions and the signal defines a visual image, selection of f.sub.0 is based on human visual perception considerations. Thus, if the highest spatial frequency of interest of the image is not greater than f.sub.0, the highest frequency band will cover the octave from f.sub.0 /2 to f.sub.0 (having a center frequency at 3f.sub.0 /4); the next-to-highest frequency band will cover the octave from f.sub.0 /4 to f.sub.0 /2 (having a center frequency at 3f.sub.0 /8), and so on.
It has been further found that the spatial-frequency processing that occurs in the human visual system is localized in space. Thus, the signals within each spatial-frequency channel are computed over small sub-regions of the image. These sub-regions overlap each other and are roughly two cycles wide at a particular frequency.
If a sine wave grating image is employed as a test pattern, it is found that the threshold contrast-sensitivity function for the sine wave grating image rolls-off rapidly as the spatial frequency of the sine wave grating image is increased. That is, high spatial frequencies require high contrast to be seen (.apprxeq.20% at 30 cycle/degree) but lower spatial frequencies require relatively low contrast to be seen (.apprxeq.0.2% at 3 cycle/degree).
It has been found that the ability of the human visual system to detect a change in the contrast of a sine wave grating image that is above threshold also is better at lower spatial frequencies than at higher spatial frequencies. Specifically, an average human subject, in order to correctly discriminate a changing contrast 75% of the time, requires roughly a 12% change in contrast for a 3 cycle/degree sine wave grating, but requires a 30% change in contrast for a 30 cycle/degree grating.
The perceived inherent sharpness of an image depends on the ratio of the maximum present spatial frequency of interest f.sub.0 to the solid angle of view subtended by the image with respect to the human eye. This solid angle equals approximately the area of the image divided by the viewer's distance from the image.
When an image is expanded (or enlarged) e.g. by a linear factor of two, its two-dimensional area is expanded by a factor of four. Expansion is accomplished by inserting additional pixels (or samples) in the picture. Typically the newly inserted samples are assigned level-values that are calculated to be the averages of the level-values of their neighboring samples, and the spatial frequencies are accordingly halved. Therefore, the ratio of the maximum frequency of interest f.sub.0 to the viewing angle is accordingly degraded, unless the viewer "steps back", i.e. increases the distance from the image by the same linear factor that the image was expanded by. This would decrease the solid angle and restore the perceived inherent sharpness of the image. Consequently, an image resulting from an expansion without more processing appears to a stationary viewer as having less inherent sharpness than the original, i.e., the image appears blurred. Stated otherwise, enlargement without more processing leads to an image lacking concomitant information in the higher spatial frequency bands.
5. The Burt Pyramid Algorithm for Spatial Frequency Analysis
One example of a technique for enhancing images is the Burt Pyramid Algorithm (developed by Peter J. Burt). The Burt Pyramid Algorithm permits an original high-resolution image to be synthesized from component sub-spectra images without the introduction of spurious spatial frequencies due to aliasing. The Burt Pyramid Algorithm is particularly well-suited for both analyzing a spatial frequency spectrum of images and for synthesizing an image from its analyzed sub-spectra.
The term "pyramid" as used herein, generally relates to the successive reduction in the spatial frequency bandwidth and sample density of each of the hierarchy of component images in going from the highest octave component image to the lowest octave component image.
One example of a technique for enhancing images is the Burt Pyramid algorithm. The Burt Pyramid Algorithm uses particular sampling techniques for analyzing a relatively high resolution original image into a hierarchy of N (where N is a plural integer) separate component images (in which each component image is a Laplacian image comprised of a different octave of the spatial frequencies of the original image) plus a remnant Gaussian image (which is comprised of all the spatial frequencies of the original image below the lowest octave component Laplacian image).
In the following the input image is referred to as G.sub.0, the LPF versions are labeled G.sub.1 through G.sub.N with decreasing resolutions and the corresponding edge maps are labeled L.sub.0 through L.sub.N respectively.
A stage of Burt Pyramid analysis is shown in FIG. 1. An input image (denoted as G.sub.0) is input, then convolved and decimated using a convolution filter 102, to produce a filtered decimated image G.sub.1. Sub-sampling is also generally referred to as "decimating".
The convolution filter 102 is a low pass filter that exhibits spatially localized, gradual roll-off characteristics, rather than "brick wall" roll-off characteristics. More specifically, each of the low pass filters employed by a Burt Pyramid analyzer meets each of the following two constraints. First, each of these filters employs a localized, symmetric kernel weighting function composed of at least three multiplier coefficients. Second, the multiplier coefficients of the kernel weighting function provide equal contribution; that is, all nodes at a given level contribute the same total weight to nodes at the next higher level. In the case of a three-tap filter, this means that the respective values of the three weighting multiplier coefficients of the kernel function of the filter in each dimension are respectively 0.25, 0.5, and 0.25. In the case of a five-tap filter, the values of the five weighting multiplier coefficients of the kernel function of the filter in each dimension are respectively (0.25-p/2), 0.25, p, 0.25, and (0.25-p/2), where p has a positive value.
The input image G.sub.0 is delayed by a delay element 104. The filtered decimated image G.sub.1 is re-expanded and interpolated by an expansion and interpolation filter 108. The expanded and interpolated G.sub.1 is subtracted from the delayed G.sub.0 by a subtraction element 106 to produce L.sub.0, the first order edge map, also known as Laplacian. It is noteworthy that there are many ways a Laplacian can be computed. The term "Laplacian" does not inherently signify a particular method of derivation.
The filtered decimated image G.sub.1 is subsequently input to a second stage, that is similar to the first, to produce G.sub.2 and L.sub.1, etc. Iteration continues until a desired number of pyramid levels is achieved.
The Burt Pyramid algorithm may further employ a synthesis component, that works in reverse to reconstruct the original image G.sub.0 from the remnant sampled signal G.sub.N and the Laplacian sub-spectra sampled signals L.sub.0, . . . , L.sub.N-1.
The reconstruction process involves adding to a given LPF (the remnant) version of the image, G.sub.N, the band pass images, L.sub.j (j=N-1, . . . , 0), thus reconstructing the Gaussian pyramid, level by level, up to the original input image, G.sub.0. This is a recursive process as in equation (1): EQU G.sub.j =L.sub.j +G.sub.j+1 :j=N-1, . . . , 0 (1)
where G.sub.j+1 is expanded, via interpolation, to the G.sub.j image size prior to the addition process.
Typically, the Burt Pyramid algorithm is implemented by computer in non-real time. Non-real time implementation of the Burt Pyramid algorithm by computer processing is particularly effective in processing fixed image information, such as a photograph. However, it is not particularly effective when applied to a stream of successively-occurring images continually changing in time (e.g., successive video frames of a television picture), unless special computation means are used, such as special purpose integrated circuits.
6. The Filter-Subtract-Decimate Algorithm for Spatial Frequency Analysis
An alternative to the Burt Algorithm was achieved by a Filter-Subtract-Decimate (FSD) Hierarchical Pyramid described in U.S. Pat. No. 4,718,104.
A stage of the FSD pyramid analysis component is shown in FIG. 2. As can be seen from FIG. 2, the FSD technique includes similar functional components to that of the Burt Pyramid algorithm of FIG. 1. In particular, an input image G.sub.0 is filtered by a convolution filter 202. The input image G.sub.0 is also delayed by a delay element 204, and the output image from convolution filter 202 is subtracted from a delayed version of input image G.sub.0 by subtraction element 206 to produce an edge map L.sub.0.
For an input signal comprising a sample stream of certain samples carrying information, the FSD analyzing technique involves convolving the image at every sample location with a symmetrical, localized, equal-contribution, low pass filter, n-dimensional kernel function having a low-pass transmission characteristic to derive a convolved sample stream. The convolved sample stream includes filtered samples that individually correspond to each of some of the certain samples of the input stream. The level-value of each of the filtered samples is subtracted from the level-value of that individual certain sample with which it corresponds, to derive an edge map L.sub.0. Edge map L.sub.0 comprises a first output sample stream that includes information-component samples corresponding to the input certain samples that define the band pass sub-spectrum with the particular relatively high sample density.
Unlike the Burt algorithm, convolution filter 202 of the FSD technique does not include a decimation element. Rather, a separate decimation element 205 is provided for receiving the output from convolution filter 202. Decimation element 205 operates to produce G.sub.1. Hence, decimation of the convolved image need not be performed prior to the subtraction for the delayed input image. As a result, re-expansion and interpolation before subtraction are not necessary, as required in Burt Pyramid algorithm, to construct the edge map. Therefore, the separation of the decimation element from the convolution element represents a principal improvement in diminishing the required processing time of the FSD technique of the Burt Pyramid algorithm.
In all stages of the FSD algorithm, the convolved sample stream is decimated to derive a second output sample stream that includes information-component samples corresponding to only a given sub-multiple of the certain samples that define the remnant sub-spectrum with a relatively lower sample density in each dimension than the particular relatively high sample density.
Operation of the FSD algorithm as illustrated in FIG. 2 results in the generation of a Laplacian pyramid. The Laplacian pyramid consists of band pass filtered (BPF) versions of the input image, with each stage of the pyramid constructed by the subtraction of two corresponding adjacent levels of the Gaussian pyramid. The Laplacian pyramid can also be viewed as a difference-of-Gaussians (DOG) pyramid, where the DOG kernel, which is a good approximation to the Laplacian operator, is convolved with the input at every sample location to produce corresponding edge maps.
It has been shown that the Laplacian pyramid forms a complete representation of the image for spatial frequencies ranging from 0 to a preselected maximum frequency of interest f.sub.0. Therefore, with the pyramid representation, complete reconstruction of the original image is enabled, for spatial frequencies up to f.sub.0.
In general, pyramidal filtering schemes, such as the Burt Pyramid algorithm and the FSD algorithm, exploit some of the perceptual considerations noted above to provide reasonably effective filtering for certain applications, particularly non-real time applications. However, pyramid algorithms, such as those described above, typically require a large amount of computations which limit the effectiveness for real-time applications, particularly those wherein a visual image changes with time, such as a television image. Some of these limitations are presently overcome by a pyramid chip available from David Sarnoff Laboratories.
7. Image Sharpening by Enhancing Existing High Frequencies
The prior art concentrates mostly on enhancing existing high spatial frequencies of a given input image. However, the addition of high frequencies to such images requires increased data handling for processing. Different enhancement schemes will result in images that are formally different from the original.
Enhancing existing high frequencies is performed by convolving the input image with masks, the masks having a basic high pass filter characteristic, or derivative function filter characteristic. Such a technique is high boost filtering, as described in Digital Image Processing by Rafael C. Gonzalez and Richard E. Woods, 1992, pp. 195-201, by Addison Wesley.
"Unsharp masking" (or high frequency emphasis method) is another general technique of sharpening an image by enhancing its existing high frequencies. Unsharp masking is implemented by creating a low pass filtered version of the input image and then subtracting it from the input image to create a high pass filtered version of the image. A fraction of this high pass filtered version is subsequently added to the image. Even though this approach yields reasonably good results in many cases, some undesired noise is generated in the dark regions of images, and often the enhanced images are not visually pleasing.
Subsequent techniques replace the high pass filter step with a non-linear filter, and repeat the technique of adding a version of the image to the original image. Examples of such techniques are suggested by Mitra, as explained in the next section.
8. Mitra Image Enhancement Techniques
An image enhancement scheme proposed by Mitra et al. (S. K. Mitra, H. Li, I. Lin and T. Yu, "A New Class of Nonlinear Filters for Image Enhancement," ICASSP 91, M5.1, pp. 2525-2528, 1991) can be seen in FIG. 3a. A filtered version of the input image is calculated and added back to the input image.
Filtering is performed in three stages. First, an adjustable DC bias and the input image are input in the input ports of adder 310. The output of the adder is connected to the input of the second nonlinear filtering stage 340.
The second nonlinear filtering stage 340 produces a result that is biased either in the vertical and horizontal directions, or along the diagonal directions-with the latter chosen as giving better performance. The output of the nonlinear filtering stage 340 is connected to the input of the third filtering stage 350.
The third filtering stage 350 adds a non-linearity via a multiplication of the highpass filter by the local mean. The third filtering stage 350 formally produces a mapping. The mapping function can be seen in FIG. 3b. Horizontal axis 352 signifies the level-value of the input pixel, and vertical axis 358 signifies the level-value of the respective output pixel. The mapping line 354 suggests what level-values of output pixels are returned for what level-values of input pixels.
The mapping line 354 has the effect of adding less of the high-frequency components to the dark regions and more to the brighter ones, and can be desirable for a smoother perception of the enhanced result.
The procedure thus outlined has the undesirable effect of shifting the phase (i.e., the Zero Crossing (ZC) of the second derivative) towards the brighter region, thereby causing edges to appear slightly shifted, resulting in reduced fidelity to the input image. Further, aliasing in the output image is introduced.
9. Related Art Summary
The foregoing summarized a number of prior art image processing techniques particularly adapted for enhancing or sharpening two dimensional visual images. Although the various techniques have proven to be reasonably effective for certain applications, there is room for improvement, particularly in the fidelity of the enhanced output images with respect to time varying input images. In particular, conventional pyramid filtering techniques such as the Burt pyramid algorithm and the FSD algorithm are too computationally intensive to be effective for real time applications, unless custom made pyramid chips are used. Furthermore, the conventional pyramid techniques do not allow for the addition of spatial frequencies higher than those contained in the image being processed. As such, the conventional pyramid techniques do not fully exploit certain visual perceptual considerations, such as the perception that an image with higher spatial frequencies is a more faithful representation of a true image, regardless of the actual fidelity of the image.
The Mitra technique is somewhat more effective for sharpening an image but does not provide a resulting image with fidelity to the edges of the input image, and introduces aliasing.
As can be appreciated, it would be desirable to provide an improved image filtering technique, which can be applied to sharpen images with fidelity to the edges and no aliasing, and ideally also in real time to sharpen time varying images.
Further, since image expansion results in blurring, it would be desirable to provide an improved image filtering technique to sharpen expanded images.