The present invention relates to image processing and, more particularly, to the detection of edges, lines, and other linear features in two-dimensional discrete images utilizing a sequential detection tehnique.
Edge detection represents one of the first processing steps in a great many computer vision and image processing tasks. Reflecting this importance, the literature devoted to the problem is enormous. One need only consult a recent bibliography such a A. Rosenfeld, "Picture processing: 1983", CVGIP, vol. 26, pp. 347-393, June 1984, to gain an appreciation of this fact. Many tasks in computer vision, pattern recognition, and image processing depend upon the successful execution of this step.
Consider two-dimensional digital images. By digital it is meant that the image is discrete in the spatial domain, e.g. the image intensity function is not continuous over the two dimensions but defined only on an array of points, and the intensity levels at these points are furthermore quantized into a fixed, finite number of levels.
The underlying assumption in this and many other treatments of edge detection is that edges of interest in real scenes such as object boundaries, etc. are represented in an image as a discontinuity in intensity. Therefore the task of edge detection becomes one of identifying intensity discontinuities. The human visual system and perception are such that discontinuities are not the only intensity functions that are percevied as "edges." Other possibilities include discontinuities in the first derivative of the intensity function (ramp edges), texture edges, and color changes. While these features are important in some contexts, and are in fact sufficiently similar to the problem of intensity discontinuities to perform their detection by sequential techniques, this discussion is confined solely to the first problem. It should be noted, however, that sequential algorithms attempt to exploit only the connectivity of edges and are largely independent of the specific edge operator used.
Given that the goal is to identify intesnity discontinuities, two general classes of techniques have emerged to address this problem: gradient-type operators and parametric models. Gradient-type operators, which for the purposes of this discussion will include first and second order spatial derivative operators, are discrete spatial filters whose magnitude responses have various high pass characteristics. In particular, they attempt to perform the discrete equivalent of a two-dimensional gradient, a direction derivative operator, or a second order derivative operator. The idea is to emphasize those regions of an image where the intensity function changes rapidly with distance and suppress the area with little change in intensity. These operators may also provide information regarding the direction of the gradient, or in the case of directional operators, the component of the gradient in a given direction.
Generally speaking, gradient-type operators are implemented with one of a variety of window functions. This is due to the fact that real edges are of finite extent and therefore the operator must have finite support, i.e. be windowed. If the window function is rectangular, the spectral response of the operator will exhibit the familiar Gibbs phenomenon of Fourier theory. The large gain at high spatial frequencies exacerbates the effects of noise on the output. As in other signal processing applications, the answer to this problem is to employ smoother window functions such as Hamming, Hanning, or Guassian windows.
Parametric models view the image intensity fucntion as a surface and this surface is projected onto a set of basis functions. From this modeled surface, edge parameters such as slope position and direction are estimated. A question of importance is the completeness of the basis function set, as the parameters can be estimated only from the projection of the actual image onto the space spanned by that set. For the purposes of this discussion we will include in this class the moment-type operators. These methods attempt to detect intensity discontinuities from the moments of the distribution of intensity values in a window.
Practically all edge detection schemes proposed to date in the above classes involve a classification step that utilizes one or more thresholds. Having obtained estimates of the gradient magnitude of direction or edge parameters from a fitted model, some mechanism must be employed to decide whether or not those quantities indicate the presence of an intensity edge at the location in question. This classification step is performed via a decision threshold. Even the second order derivative approach cannot avoid this step. Although zero crossings of the two-dimensional Laplacian nominally indicate intensity edges, even small amounts of noise contribute to a very high density of noise induced zero crossing contours. Therefore, practical implementations must apply a threshold to the slope of the second derivative perpendicular to the zero crossing.
This thresholding process may be accomplished in various ways. Earlier techniques established a global threshold on the basis of the histogram of operator outputs or the Receiver Operating Characteristic (ROC) of the operator. More recent methods select the threshold in an adaptive manner based on local image content or on entropy ideas.
The question of threshold selection raises two fundamental and related problems with all of these edge detection techniques. The first of these is known as streaking. This phenomenon results from the fact that real images are highly non-homogeneous and edge parameters may change substantially even along the same edge contour. Regardless of the sophistication of the threshold selection process, it is possible and in fact common that due to noise, the operator output is at times above and other times below the decision threshold along the length of a given edge contour. This results in an edge map in which edges that are in reality a single connected contour are only partially detected. The broken segments or streaks are a major concern since many processing tasks that follow edge detection require well connected or even closed edge contours. On the other hand, if the thresholds are set so liberally that the edges of interest are detected with good connectivity, then many false detections and multiple responses on strong edges occur. This is the classical detection theory trade-off between probability of detection and probability of false alarm. Not only is it difficult to decide on a threshold, it is fundamentally impossible to simultaneously achieve a high detection probability and low false alarm rate as the signal-to-noise ratio decreases.
A second and related problem is the performance at low signal-to-noise ratio. Since operators attempt to make a decision based only on local information, as the noise power increases, this decision becomes increasingly more difficult to make. The solution generally adopted is to increase the number of observations that contribute to the decision, i.e. make the operator larger. This improves the output signal-to-noise ratio of the operator but only at the expense of spatial resolution. The way to circumvent that problem is to employ a set of directional operators: long skinny operators that pick up their additional observations along an edge rather than out in every direction. This may only be taken so far, however, because the more directional the operator, the larger the requisite set of such operators. Additional complications arise from the fact that edges in real images tend not to run straight for very long. This mandates the inclusion of curved operators which further compounds the job of choosing an operator set.
Classical detection theory states that the way to improve performance at low singal-to-noise ratio is to increase the number of observations contributing to the decision. As we have just seen, simply increasing the size of the edge operator is moderately successful in this regard but at the expense of spatial resolution. In addition, the desirable "edge information" generally lies in the vicinity of the edge itself, so picking up observations far from the edge contributes little to the decision process. Using directional operators improves the output-signal-to-noise ratio while maintaining good spatial resolution, but this approach soom becomes unwieldy as the number of such operators increases geometrically with their length.
One possible way out of this dilemma is to assemble observations along the edge contour. Observations are made in a very long and narrow path that is deformed to lie along the edge, including curves, corners, and straight segments. Since the set of all possible such paths is enormous, the paths are instead grown in an iterative fashion beginning at a point that is known to lie on the edge. This is the basic philosophy behind sequential edge detection. A searching algorithm attempts to hypothesize possible edge topologies or paths. These paths are extended iteratively, with the current most probable path extended by one observation at each iteration.
For this technique to succeed in finding edges, a means of comparing all paths hypothesized so far has to be provided. This comparison is accomplished by associating with each path a statistic called a path metric which reflects the given path's probablity of coinciding with the edge contour. Therefore, only the most likely paths are extended by the searching algorithm. In this way an exhaustive search is avoided.
Sequential edge detection has several potential advantages over the techniques discussed earlier. First is offers the possibility of better performance (higher detection probability, lower false alarm probability) at low image signal-to-noise ratio than the local operators, since it obtains many more observations along the edge. For the same reason, the problem of choosing a detection threshold is alleviated: it is much easier to decide "edge" or "no edge" based on many observations than on a few. Secondly, by the very nature of the searching process, the detected edge paths exhibit complete connectivity. Therefore, streaking can be eliminated. Although it is not obvious from this discussion, two subtle advantages to a sequential approach also arise. One is that it allows an analytical treatment of the probability that segments of detected edge contours are in error rather than merely points. The second is that it provides a framework in which the correlation between observations in an image can be exploited to aid in the detection process. The principle disadvantage of a sequential approach is one of computational speed. Actually, on a sequential processor, such algorithms tend to be more efficient than parallel algorithms. But the latter has the potential of dramatically improved speed on special purpose parallel architectures.
We will return to discuss all of the foregoing ideas in more detail later, but first we shall review preceding efforts in the literature related to the detection of edges in images by sequential methods. Early work by Fishler and Elschlager, "The representation and matching of pictorial structures," I.E.E.E. Trans. Computers, vol. C-22, pp. 67-92, January 1973, while not precisely a sequential edge detection technique, nevertheless represents one of the earliest efforts to recognize the fact that improved performance at low signal-to-noise ratio can be accomplished by considering edge contours as a whole rather than local points. In their method, hypotheses consist of "embedded" edge contours. For each such embedded contour, an associated cost is calculated. A dynamic programming technique is used to search the (enormous) space representing all possible contours in an effort to find that with the lowest associated cost. In a similar fashion, R. Griffith, "Automatic boundary and feature extraction for the estimation of left ventrical volume in man from serial selective angiocardiograms," D. Eng. Thesis, Rensselaer Polytechnic Institute, August 1973, has also used a dynamic programming technique.
Both of these methods have two common problems. Dynamic programming tecnniques attempt to find the best hypothesis in a search. Although they are more efficient than an exhaustive search, the amount of computation for even modest image sizes is still very large. Secondly, the associated cost of an embedded contour is an ad-hoc quantity requiring a considerable amount of tailoring to specific images.
In an effort to reduce the amount of computation involved with dynamic programming methods, Chien and Fu, "A decision function method for boundary detection," CGIP, vol. 3, pp. 125-104, 1974, have proposed the use of what is known as depth-first tree searching. They explicitly formulate the edge detection problem as a search through a rooted tree. Their branch "costs," however, are highly specialized to the type of image under consideration and employ a great deal of a-priori information. In a similar fashion, Martelli, "An Application of Heuristic Search Methods to Edge and Contour Detection," Common. ACM, Vol. 19, pp. 73-83, 1976, formulates the problem as a graph searh and uses the A* algorithm, N. Nilsson, Problem-Solving Methods in Artificial Intelligence, McGraw-Hill, 1971, to perform the search. This algorithm is quite similar to the Z-J or stack algorithm described below. Again, Martelli's cost function is ad-hoc and peculiar to the type of image under consideration.
Extending the work of Martelli, D. Cooper, "Maximum likelihood estimation of markov-process blob boundaries in noisy images." IEEE Trans. PAMI, vol. PAMI-1, pp. 372-384, October 1979, has also used the A* search algorithm. He has attempted to take some of the arbitrariness out of the cost function by basing that function on a likelihood statistic. He models hypothesized edge contours as a Markov process. The image is modeled as consisting of two region types, "background" and "object," separated by the edge contour. The two types are assumed to be of different but constant intensity. The cost statistic is then the joint likelihood of the particular hypothesized edge contour and the image pixel values, given the assumption that all pixels inside the contour belong to the object at one gray level value and all outside belong to the background at the other gray level value. Note that for each hypothesized edge contour, the statistic must be calculated over the entire image.
This method represents one improvement over its forerunners but exhibits several serious drawbacks. On the plus side, the cost function is at least statistical in nature and so has a better theoretical basis than the heuristic functions discussed above. This allows for some performance analysis. Also, the Markov edge contour model captures an important characteristic of real edges, as discussed below. On the other hand, the assumption of only two image pixel types, "object" and "background," independent and of constant gray level values, is highly unrealistic. In practically any real image of interest, pixel gray level values within and outside of objects may vary considerably due to lighting inhomogeneities, shadows, and object inhomogeneities. Pixels are almost never stochastically independent. Furthermore, such a problem statement is only useful for finding bounding contours of objects and is useless for finding internal edges, and interesting edges, all of which may be important to subsequent processing.
Ashkar and Modestino in "The contour extraction problem with biomedical applications," CGIP, vol. 7, pp. 331-355, 1978, disclose a technique similar to Chien and Fu above, starting by formulating the edge detection problem as a tree search. They make the important step of applying search to the output image of an edge operator rather than to the original image. This helps overcome some of the shortcomings of Cooper's approach. A Z-J or stack algorithm, discussed below, is used to perform the search. This method represents a fully mature example of sequential detection with additive brach costs or metrics and a truly sequential search. However, its metric suffers from two problems: it is again very ad-hoc in nature, with components that depend on experimentally determined parameters and look-up tables, and it requires a prototype contour. This latter is a contour provided by some a-priori knowledge base that helps to guide the search toward some preconceived estimate of the final edge map. This represents rather high quality a-priori information and, while possible appropriate to certain narrow classes of problems, is a severe limitation to the method's generality. This metric formulation furthermore limits any analytical treatment of the method.
It is seen, therefore, that there is a need for a method which formalizes the use of sequential searching as an alternative to the use of local thresholds as a classification criteria on the output of edge operators, Since it is a processing step following the application of an edge operator, it is largely independent of which specific operator is used. The method should allow the integration of many observations along an edge contour to enter into the classification process in order to improve the performance at low signal-to-noise ratio, ease the difficulty in choosing a threshold, and eliminate the incidence of streaking in the resulting edge map.