1. Field of the Invention
This invention relates to systems for identifying images.
2. Description of the Prior Art
Digital electronic technology (and particularly digital computers) has changed almost every facet of modern life. In spite of the ever-increasing use of digital technology, life still goes on in an analog fashion. Visual, tactile, and audio images still comprise the bulk of sensory experiences for human beings. Full exploitation of digital technology has been limited by the ability to convert these analog images to digital data and to distinguish the images from each other.
Converters which can digitize a visual image or a series of sounds are now commonplace. Any audio or visual image can be converted to an array of digital data. The problem is, however, to deal with that data in a meaningful manner.
Conventional pattern or image recognition technology has serious speed limitations which in general originate from the use of conventional digital computer processing architecture. This architecture requires the use of serial processing algorithms which do not easily accommodate large amounts of parallel information.
Two methods are commonly used in the prior art to recognize patterns: "template matching" and "feature extraction". In the template matching method, a reference pattern is stored for each response. Each input image is then compared with each reference pattern until a match is found. The number of reference patterns which can be recognized is obviously limited, since substantial time and memory is required to serially search for a match. Because of practical limitations on speed and memory this technology cannot accommodate applications such as natural speech input, visually guided motion, or object tracking.
The feature extraction method attempts to speed up this process. Rather than match an entire image, a small set of features is extracted from the image and compared to a reference set of features. This method can be very complex, as well as time-consuming. An example of the complexity involved in the feature extraction technique is the problem of recognizing a person's face. The difficulty of defining the features of a person's face mathematically and then writing a procedure to recognize these features in an image is overwhelming.
Most conventional approaches to pattern recognition represent information from images in a format which is incompatible with spatial or temporal integration. For example, each image type or image source typically has unique processing algorithms, and the results are not easily combined. In speech, for example, there is generally no common representation of information from the acoustic level to the word, phrase, or semantic levels (temporal integration). As a result, conventional speech recognition methods typically deal with incompatible information formats at every level. Severe processing demands are made in order to accommodate this situation. In the case of multiple visual images (e.g. one image for each primary color or one image from each camera) the descriptive language (information format) from each image is not easily combined to describe a single image identity (spatial integration). In another more obvious example, the descriptive language typically used for the visual image of an object (areas, perimeters, etc.) is certainly incompatable with the descriptive language for the sound which the object may be producing.
Conventional techniques generally require special computer programming to suit each specific application. Each application frequently requires: a detailed analysis of the expected input images to identify their differences; the development of a model (usually mathematical) to define the differences in computer language; and development of generally complex methods to extract the features from the images. This requires skilled personnel to specify and program the complex algorithms on digital computers, and also requires expensive computer programming development facilities. This development process generally must be repeated for each new type of input images.
In those applications where the input images can be totally specified, conventional technology has generally been successful. An example is the field of optical character recognition, which has been the object of considerable research and development over the past twenty-five years. On the other hand, in those applications which deal with time varying images which frequently cannot be prespecified, the conventional technology either has failed to provide technical solutions, or has resulted in extremely complex and expensive systems.
There is a continuing need for improved pattern recognition systems in many fields including speech recognition, robotics, visual recognition systems, and security systems. In general, the existing pattern recognition systems in these fields have had serious shortcomings which have limited their use.