The present invention pertains to a system for object recognition and, more particularly, to a system for guiding users to formulate and use extraction rules to analyze an image and to recognize objects therein.
Since the launch of the Earth Resources Technological Satellites (ERTS and now LANDSAT) in 1972, researchers in image processing and remote sensing have searched and continue to search for a better, more efficient way to extract objects from image data. One of the ways to achieve this goal has been through the use of higher technology hardware architectures, algorithms, and programming languages.
In the 1970s, this field was relatively new, and free thinking and approaches were highly encouraged. As a result, a number of innovative image processing languages were developed and tested. LANGUAGES AND ARCHITECTURE FOR IMAGE PROCESSING (Duff, M. J. B. and Levialdi, S., editors, 1991: Academic Press) discussed these early high level languages, providing examples which appeared, respectively, at the pages therein, referenced as follows: (1) PICASSO-SHOW, p. 18 et seq, (2) L-language, p. 39, (3) MAC, p. 48 et seq, (4) PIXAL, p. 95 et seq, and (5) IFL, p. 113. Also known is the LISP language, its use for image processing being described in THE ELEMENTS OF ARTIFICIAL INTELLIGENCE: An Introduction Using LISP (Tanimoto, Steven L., Computer Science Press, p. 400 et seq). Finally, as described hereinbelow, natural language has also been applied to image processing. As illustrated in Duff and Levialdi, none of these languages was English-like. Therefore, none could be understood by average, lay users.
In a general sense, a computer is designed to compute and solve a problem by using a software system. For the machine system to be very efficient, the software should be written with a low level language. This approach comes with a high price in developing and coding a solution algorithm. On the other end of the spectrum, developing and coding a high level language algorithm is much less costly; however, computing time is much longer. Therefore, one of the important aspects of computer science is to seek to optimize the machine/algorithm system by comprising from both ends, making the machine is an extension of the algorithm. The algorithm is also the extension of the machine system as noted by Wood in xe2x80x9cThe Interaction between Hardware, Software and Algorithms,xe2x80x9d in Duff and Levialdi. While this paradigm has worked very well for the past 40 years or more, the ability of users in problem solving is totally missing.
Since the early 1980s, researchers have noticed that under the hardware/software interaction paradigm, few people (except programmers) can truly communicate with a machine system. Attempting to correct this obvious deficiency, researchers have begun to develop human-based, and specifically, English-based interface systems as a part of natural language processing. The result has largely been in the domain of a man-machine dialogue, as shown in Table 1, reprinted from Duff and Levialdi, p. 218.
The extension of this approach is the current standard query language (SQL) and expert system/knowledge based system.
While introduction of natural language processing into a hardware/software/algorithm system has integrated users into a problem solving system, the ability of a user is ignored, because a cognitive process in solving a problem has not taken place. This is true because: (1) the user cannot understand the language used in the algorithm; and (2) the English-based, man-machine dialogue boxes cannot guide the user to solve the problem. This condition has not changed since the mid-1980s, as evidenced by an expert system language called LISP, which was popular in the late 1980s, and IDL, a current, relatively high level interactive data language for image processing and visualization.
In summary, none of these historical and current image processing related languages has been able to guide the user to develop a solution algorithm, and improve his or her skills in object extraction by interacting with the vocabularies and syntax of the language. In other words, there has been no cognitive process in problem solving experienced by the users of these languages.
More generally, it has been found that any task, relatively simple or complex, in any field of endeavor, can be subject to learning by an unsophisticated or underskilled, but trainable user. Thus, the technique to which this invention is directed is applicable to a wide variety of subject matter, especially when combined with simulation systems, in fields including, but not limited to: medicine (surgery), electronics, science, architecture, cooking, language, crafts, music, engine repair, aircraft and other machine operation, inventory control, and business. For purposes of explanation herein, however, the following disclosure is related to an environment of image processing; but it should be understood that the invention, as defined by the appended claims, is meant to encompass training techniques used in all suitable fields or subject matter, in which a relatively unskilled or underskilled trainee can become an expert.
It would be advantageous to provide users with a programming language that uses their own vocabularies, phrases and concepts or those of photo-interpreters to generate rule sets that are compilable to object extraction programs in real time.
It would be doubly advantageous, if the users are novices to begin with, to allow them to become experts without knowing any computer language; and if the users are experts, their knowledge can be captured, tested, and preserved for future users.
It would also be advantageous to provide users with an intelligent graphic panel for users to generate expert system code with a few or even no keystrokes.
It would further be advantageous to provide users with an intelligent editor for users to generate complex expert system code with a few or even no keystrokes.
It would still further be advantageous to provide users with an open, flexible, and editable expert system to capture the knowledge of experts in the field.
It would also be advantageous to provide users with an open, flexible, and editable expert system for testing and modifying an existing expert system.
It would further be advantageous to provide users with a programming language and related graphic user interface (GUI) and editor sub-systems to guide users to build solution systems of object extraction, helping them to become experts.
It would still further be advantageous to provide users with means to generate object-based transformations from multispectral and hyperspectral image data to guide them in building solution algorithms in object extraction.
It would also be advantageous to provide users with a means to generate fraction planes from a hyperspectral image cube in substantially real time to guide users to develop object extraction algorithms.
It would further be advantageous to provide users with a means to estimate the confidence of an object extraction process, be it coming from a rule based system, a matching analysis, or a combination of both.
The present invention features a method of training a user to become an expert in identifying an object in an image or scene, by querying a computer system. The computer system has a lexicon of photo-interpreters. The user can formulate object extraction rules, as he or she becomes an expert in object recognition and extraction. The method consists of providing a programming language that has information supplied by at least one expert photo analyst, and has optional extraction rules that are dependent upon that information, as well as information input by the user. The programming language has a vocabulary for facilitating descriptions of objects to be identified. Graphical results of the user""s queries are interactively displayed to aid in determining whether an object has been identified by the user. In a more advanced embodiment, the user can mark a feature of interest of the image and direct the computer system to generate descriptive words, phrases and rules for defining that feature of interest.
Even extremely complex object matching can be accomplished by using only real number based arithmetic and/or a so-called matching library. First, a hyperspectral image cube that has a number of spectral regions, is represented as a sum of a set of discrete data representative of each of the spectral regions. Then, a mean spectral reading value is obtained for each of the spectral regions. The mean spectral reading values are then used to build a pseudo multivariate distribution of the values. Using a Newton gravity model, the cumulative influence of substantially all of the spectral regions is computed for at least one of the spectral regions. Recognizable features are then extracted from the hyperspectral image cube. To determine how close or far one object is from another, a number of equally-weighted decisions is made, the final measure of proximity being the sum of all of the decisions. If each pixel in the image cube is compared to a calibrated spectra or a given pixel in the scene, fraction planes can be created, dependent on the percentage of match or comparison against the specified, calibrated spectra sample.
By the same principle, if an observed object is extracted from a fraction plane or any appropriate image, it can be matched by an image library that contains certain prototypical objects. The goodness-of-fit between the observed object and the closest element in the matching library is conceptualized as a confidence level. For a rule-based analysis, a confidence level of an object is assigned by the user. A combined confidence level is computed by using a fuzzy set of logic.
The present invention uses the innovative object-recognition system described in U.S. patent application, Ser. No. 08/759,280 (H-350), hereby incorporated by reference. Objects are divided into two broad categories, viz., wherein an analyst can articulate, after examining the scene content, how he or she would extract an object; and, secondly, wherein an analyst cannot articulate how to discriminate an object against other competing object descriptors, after examining the scene or a set of object descriptors (e.g., a spectral signature or a boundary contour).
In the first case, where an analyst is able to articulate the extraction of the objects, the proposed solution is to employ a pseudo-human language, including, but not limited to, pseudo-English, as a programming language. The analyst can communicate with a machine system by using this pseudo-human language, and then inform the machine how he or she would extract a candidate object without having to rely on a xe2x80x9cthird-partyxe2x80x9d programmer.
In the second case, where an analyst has determined that he or she is unable to articulate the extraction of an object, the proposed solution is to use an appropriate matcher with a matching library to extract the candidate object, and then pass it over to processors employed in the first-category sphere. The matching system of the present invention is accomplished by representing an observed object as a pixel in a multispectral or a hyperspectral image cube. An image cube contains two or more spectral bands. For example, a typical LANDSAT Thematic Mapper contains seven bands, and a HYDICE image cube contains 210 bands.
Conventional methods of matching objects in a signature domain are based largely on matrix theory. Thus, a matcher is usually associated with inverting a large-size matrix. This method is very computation intensive. For example, to obtain an eigen vector from an image cube of 200 bands, each band is of dimension 512xc3x97512 pixels. It currently requires two hours, using a SUN SPARC station 2(trademark) computer, to perform this task. But using a non-matrix theory-based method to perform the task reduces the computing to less than one minute using a Sun Ultra 10(trademark) system (300 Mhz single processor).
Once an extracted object is passed over to the first environment, this object becomes describable by using the proposed pseudo-human language. Thus, it can be combined with other existing objects, each having a certain level of confidence, to extract still further objects. The final result, then, is the extraction of a set of complex or compound objects with a certain level of combined confidence.
Copending U.S. parent patent application, Ser. No. 08/759,280, filed by the present applicant on Dec. 2, 1996, for A LEXICON-BASED AND SCRIPT-BASED METHOD OF RECOGNIZING OBJECTS IN IMAGES, and hereby incorporated by reference, discloses a means of communication between an analyst and a computer. This human computer interface, in the form of a pseudo-human-based programming language, includes a photo-interpreter that can extract the two types of target complexes.
In addition to serving as an interface module between an analyst and a computer, this language functions in two significant ways: (1) it is a vehicle to capture and preserve the knowledge of the human analysts; and (2) it is an environment in which an analyst can organize his or her image-exploitation knowledge into computer-compilable programs. That is, it is an environment for knowledgeably engineering automatic, object-recognition processes.
A matching method is provided for recognizing objects in multispectral and hyperspectral image data. For single band data sets, a match can be performed using,a library composed of two-dimensional or three-dimensional imagery elements. For multispectral and hyperspectral imagery, however, a match can be performed using a library composed of spectral signatures, each representing a specific type of material. Even extremely complex object matching can be accomplished by using only real number based arithmetic. First, a hyperspectral image cube that has a number of spectral regions, is represented as a sum of a set of discrete data representative of each of the spectral regions. Then, a mean spectral reading value is obtained for each of the spectral regions. The mean spectral reading values are then used to build a pseudo multivariate distribution of the values. Using a Newton gravity model, the cumulative influence of substantially all of the spectral regions is computed for at least one of the spectral regions. Recognizable features are then extracted from the hyperspectral image cube. To determine how close or far one object is from another, a number of equally-weighted decisions is made, the final measure of proximity being the sum of all of the decisions. If each pixel in the image cube is compared to a calibrated spectra or a given pixel in the scene, fraction planes can be created, dependent on the percentage of match or comparison against the specified, calibrated spectra sample. Thus, matching is achieved by finding the closest match between an observed object and an element of the library, coupled with an associated goodness-of-fit indicator or a confidence level.