1. Field of the Invention
This invention relates to systems for identifying images.
2. Description of the Prior Art
Digital electronic technology (and particularly digital computers) has changed almost every facet of modern life. In spite of the ever-increasing use of digital technology, life still goes on in an analog fashion. Visual, tactile, and audio images still comprise the bulk of sensory experiences for human beings. Full exploitation of digital technology has been limited by the ability to convert these analog images to digital data and to distinguish the images from each other.
Converters which can digitize a visual image or a series of sounds are now commonplace. Any audio or visual image can be converted to an array of digital data. The problem is, however, to deal with that data in a meaningful manner.
Conventional pattern or image recognition technology has serious speed limitations which in general originate from the use of conventional digital computer processing architecture. This architecture requires the use of serial processing algorithms which do not easily accommodate large amounts of parallel information.
Two methods are commonly used in the prior art to recognize patterns: "template matching" and "feature extraction". In the template matching method, a reference pattern is stored for each response. Each input image is then compared with each reference pattern until a match is found. The number of reference patterns which can be recognized is obviously limited, since substantial time and memory is required to serially search for a match. Because of practical limitations on speed and memory this technology cannot accommodate applications such as natural speech input, visually guided motion, or object tracking.
The feature extraction method attempts to speed up this process. Rather than match an entire image, a small set of features is extracted from the image and compared to a reference set of features. This method can be very complex, as well as time-consuming. An example of the complexity involved in the feature extraction technique is the problem of recognizing a person's face. The difficulty of defining the features of a person's face mathematically and then writing a procedure to recognize these features in an image is overwhelming.
Most conventional approaches to pattern recognition represent information from images in a format which is incompatible with spatial or temporal integration. For example, each image type or image source typically has unique processing algorithms, and the results are not easily combined. In speech, for example, there is generally no common representation of information for the acoustic level to the word, phrase, or semantic levels (temporal integration). As a result, conventional speech recognition methods typically deal with incompatible information formats at every level. Severe processing demands are made in order to accommodate this situation. In the case of multiple visual images (e.g. one image for each primary color or one image from each camera) the descriptive language (information format) from each image is not easily combined to describe a single image identity (spatial integration). In another more obvious example, the descriptive language typically used for the visual image of an object (areas, perimeters, etc.) is certainly incompatable with the descriptive language for the sound which the object may be producing.
Conventional techniques generally require special computer programming to suit each specific application. Each application frequently requires: a detailed analysis of the expected input images to identify their differences; the development of a model (usually mathematical) to define the differences in computer language; and development of generally complex methods to extract the features from the images. This requires skilled personnel to specify and program the complex algorithms on digital computers, and also requires expensive computer programming development facilities. This development process generally must be repeated for each new type of input images.
In those applications where the input images can be totally specified, conventional technology has generally been successful. An example is the field of optical character recognition, which has been the object of considerable research and development over the past twenty-five years. On the other hand, in those applications which deal with time varying images which frequently cannot be prespecified, the conventional technology either has failed to provide technical solutions, or has resulted in extremely complex and expensive systems.
There is a continuing need for improved pattern recognition systems in many fields including speech recognition, robotics, visual recogition systems, and security systems. In general, the existing pattern recognition systems in these fields have had serious shortcomings which have limited their use.
Existing speech recognition systems generally have the following disadvantages. First, they exhibit "speaker dependence"--only the speakers trained on the system can use it reliably. Second, they typically provide only isolated word recognition--the speaker must pause between words in order to allow adequate processing time. Third, they have small vocabularies--typically less than one hundred words. Fourth, they are very sensitive to extraneous noises. Fifth, they have very slow response times. These properties have greatly limited the desirability and applicability of speech recognition systems.
Some commercially available speech recognition systems offer connected speech or speaker independence. These systems, however, are very expensive and have small vocabularies. None of the presently available speech recognition systems have the capability to accommodate speaker independence, connected speech, large vocabulary size, noise immunity, and real time speech recognition.
Commercially available visual image recognition systems generally do not recognize time varying images. Although systems have been proposed which have a capability of recognizing time varying images, they appear to be very expensive and complex.
The field of robotics provides a particularly advantageous application for pattern recognition. Existing robot applications suffer from too little input of usable information about the environment in which the robot is operating. There is a need for a recognition system which provides recognition of the natural environment in which the robot is operating and which provides signals to the robot control system to permit reaction by the robot to the environment in real time. For example, with visual image recognition on a real time basis, "hand/eye" coordination by a robot can be simulated. This has significant advantages in automated assembly operations. The prior art pattern recognition systems, however, have been unable to fulfill these needs.
Security and surveilance systems typically utilize real time visual input. In many cases, this input information must be monitored on a manual basis by security personnel. This reliance upon human monitoring has obvious drawbacks, since it is subject to human error, fatigue, boredom, and other factors which can affect the reliability of the system. There is a continuing need for pattern recognition systems which provide continuous monitoring of visual images and which provide immediate response to abnormal conditions.