The present invention relates to the automatic detection and interpretation of features in images, displays, and complex signals, and more particularly to methods for automatically detecting and interpreting features in images using the simulation of physical forces that force templates to move towards similar features and to deform to match such features. The present invention further relates to apparatus using the feature-extraction method for the purpose of providing automatic control or signal detection and interpretation.
The interpretation of images and displays is a function currently carried out largely in a manual fashion by skilled human interpreters. The interpretive function involves finding and identifying features and collections of features in imagery, such as a photograph, or a display, such as a radar screen. In the past, a large number of aids have been developed which aid or enhance the ability of human interpreters to carry out the interpretive function. These aids may restore the general picture clarity which, for instance, may have been reduced by shortcomings of the imaging process. This type of image processing is discussed in Andrews, H. C. and B. R. Hunt, Digital Image Restoration, Prentice-Hall, 1977, pp. 113-124 (hereafter "Andrews and Hunt"). Another kind of aid enhances the brightness of certain kinds of features in an image, such as edges, to make them more readily apparent to the eye. These aids are described extensively in Pratt, W. K., Digital Image Processing, John Wiley & Sons, 1978, pp 471-550 (hereafter "Pratt").
Techniques which attempt to automate the image interpretation task with the object of replacing the human interpreter are very limited in capability at the present time. The approach that has been used most successfully is based on a paradigm of building up large structures from smaller structures, occasionally reversing the procedure to correct for mistakes. One example, which is called edge detection, consists of combining an edge enhancement process with a thresholding process. In the combined procedure, the image is processed in such a way that pixels at edges tend to become brighter than other pixels in the image. Then pixels above a certain brightness level are labeled as hypothetical edge points. Hypothetical edge points which form a sequence based on adjacency are then assembled into hypothetical continuous line segments. Isolated edge points are dropped. Then, based on tests of certain numerical statistics such as similarity in intensity or color, or colinearity, disconnected line segments ae associated to form longer line segments. At each point in this process, statistical decision theory, as described for example in Fukunaga, K., Introduction to Statistical Pattern Recognition, Academic Press, 1972, pp. 1-121 (hereafter "Fukunaga"), or Duda, R. O. and P. E. Hart, Pattern Classification and Scene Analysis, John Wiley & Sons, 1973, pp. 1-39. (hereafter "Duda and Hart"), may be applied to accept or reject certain hypothetical structures.
Pattern recognition techniques which build large structures from smaller structures have several disadvantages. In general there is usually a large number of small structures to identify, and an extremely large number of combinations to analyze. If there is no simple way to reduce the number of combinations that have to be examined, then the process suffers an exponential growth in the number of operations to be performed. The result is that for even moderately sized problems, the number of computations involved is beyond the capability of any computer. Furthermore, small features in an image are easily obscured by noise; thus any technique exploiting small features is stopped at the start. Conversely, spurious features may also be present; for instance, edge enhancement procedures will spuriously enhance many points which do not lie on an edge. Another problem is that techniques for associating disconnected line segments, for instance the two visible parts of a line passing under an obstruction, are not very well defined and their performance is difficult to evaluate. Finally, algorithms in which operations depend on tests are difficult to implement on parallel computer architectures.
Recent work in Artificial Intelligence (AI) has aimed at reducing the computational size of vision problems. See, e.g., Winston, H. W., Artificial Intelligence, 2d Ed., Addison Wesley, 1984, pp. 159-169 (hereafter "Winston"). This is accomplished by a process identified as goal reduction: building larger features from smaller features. In this process, a sequence of several intermediate representations of features are constructed. Each of the representations is of higher complexity than the earlier ones. Advantageously, AI approaches are usually implemented using a rule-based problem solving paradigm. In this paradigm, a collection of rules is specified, each of which causes a certain function to be performed if certain conditions are satisfied. The advantage of the rule-based approach over statistical pattern recognition techniques is that non-numeric information can be exploited. This information includes knowledge of the physical and cultural context of the image as well as natural constraints related to the fundamental topology of shapes. Winston formalizes the feature recognition process as a two-step procedure called Generate-and-Test. The implementation of this process involves a generator module and a tester module. At each level of representation in the feature extraction process, hypothetical features are generated and then tested against criteria contained in the rules. One of the major goals of AI research in vision has been to exploit contextual and constraint information to limit the number of hypothetical featurs that must be generated in order to generate an acceptable one. However, the rule-based paradigm has been more successful at the testing function, which is similar to the earlier successes of rule-based systems in medical diagnosis.
Another technique known in the art for image interpretation attempts to recognize large scale features in their entirety. The central tool in this approach is correlation or template matching, as described in Levine, M. D., Vision in Man and Machine, McGraw-Hill, 1985, pp. 46-52 (hereafter "Levine"). Template matching is basically a numerical measure of similarity between a portion of the image and an idealization or model of the feature one is looking for, called a template. This approach seems to avoid the combinatorial growth problems, is well-defined in execution, and is easily implemented on parallel computer architectures. When the template is an exact duplicate of the feature in the image, and the template can be compared with the image at the exact position and orientation of the feature, then the similarity measure between the template and image will be very high at that position and orientation. The procedure is robust, even in the presence of noise in the image. Disadvantageously, in the real world, imagery features are seldom identical to the templates due to changes in apparent size and perspective, distortion in the imaging system, and the natural variability between different objects. Unfortunately, even slight distortions degrade the performance of the correlation matcher to such an extent that it is obscured by the fluctuations due to commonly observed levels of noise in the image. The only remedy for this degradation is to manually compare the template to the image in all positions, orientations, sizes, perspectives, known distortions, etc. This process is generally prohibitively expensive.
Artificial Neural Systems (ANS) technology is a parallel technology to the present invention. The basic objective of ANS is to design large systems which can automatically learn to recognize categories of features, based on experience. The approach is based on the simulation of biological systems of nerve cells. Each nerve cell is called a neuron; systems of neurons are called neural systems or neural networks. The various software and hardware simulations are called artificial neural systems or networks. Each neuron responds to inputs from up to 10,000 other neurons. The power of the technology is in the massive interconnectivity between the neurons. Neural networks are often simulated using large systems of ordinary differential equations, where the response of a single neuron to inputs is governed by a single differential equation. The differential equations may be solved digitally using finite difference methods or using analog electronic circuits. Large scale analog implementations seem to be beyond the current state of the art. Other implementations based on large-scale switching circuits have also been proposed.
There are currently two major thrusts in ANS research and development. One thrust, exemplified by Grossberg, S. and E. Mingolla, "Neural Dynamics of Form Perception: Boundary Completion, Illusory Figures, and Neon Color Spreading," Psychological Review, 1985, Vol 92, No. 2, pp. 173-211 (hereafter "Grossberg"), attempts to use the neural network simulations to recreate the functions of the brain. The other thrust, represented by researchers Tank and Hopfield, aims at demonstrating that many types of currently difficult problems can be solved efficiently on ANS hardware using the ordinary differential equation which also models neurons. See, Tank, D. W. and J. J. Hopfield, "Simple `Neural` Optimization Networks: An A/D Converter, Signal Decision Circuit, and a Linear Programming Circuit," IEEE Transactions on Circuits and Systems, Vol. CAS-33, No. 5, pp. 533-541 (May 1986) (herein "Tank and Hopfield").
One of the more common models for pattern recognition known in the art is the classification model, described by Duda and Hart as follows:
"This model contains three parts: a transducer, a feature extractor, and a classifier. The transducer senses the input and converts it into a form suitable for machine processing. The feature extractor . . . extracts presumably relevant information from the input data. The classifier uses this information to assign the input data to one of a finite number of categories." Duda and Hart, p. 4.
With respect to the division between the functions of the feature extractor and the classifier, Duda and Hart go on to say:
"An ideal feature extractor would make the job of the classifier trivial, and an omnipotent classifier would not need the help of a feature extractor." Duda and Hart, p. 4.