The present invention relates to systems and methods for identifying objects within an image and, more particularly, using experts"" knowledge to convert one or more pixels in a hyperspectral image into a 3-D image and for matching the hyperspectral image with predefined images for object recognition purposes.
In the field of image processing, it is desirable to improve search and knowledge extraction capabilities in order to facilitate the identification of features therein. One of the factors behind this pressing need is the imminent retirement of data experts who have rich experience and irreplaceable knowledge in information extraction with nuclear test data since the Cold War period. Another factor is the deterioration of the media in which the data is stored to the point that the data in its original form is not comprehensible. The present invention is motivated by the fact that none of the current content based retrieval (CBR) systems can be used to factor data experts"" knowledge into the system described in articles in xe2x80x9cStorage and Retrieval Image and Video Database I, II, III, IV, and Vxe2x80x9d (SPIE, 1993 through 1997).
In the physical world, an object is perceived as a two-dimensional and/or a three-dimensional entity that has a certain graytone, color, size and shape. To perceive the existence of such a color/size/shape defined object, the image data set must be processed so that pixels of similar properties can form a uniform field, the size and shape of which are measurable. The means by which these uniform fields are generated is referred to as a segmentation algorithm. Among the current systems, work by Soffer and Samet deals with symbolic-image databases with an emphasis on the concept of an object vs. a full scene or a grid cell. (SPIE Vol. 2679, pp 144-155.) The object by Soffer and Samet is a map feature in which segmentation is not needed.
In order to archive the information in a generic image automatically, the user must take an inventory of the features in the scene. The user must aggregate a set of spatially contiguous pixels into one substantially uniform region first. This pixel grouping process, generally referred to as segmentation, is the process of partitioning a scene into a set of uniform regions. But there are countless ways to segment an image; yet few of these ways are reliable enough for the user to obtain consistent results with imagery of diverse characteristics acquired at varying time intervals. A number of sophisticated segmentation methods have been developed. For example, an edge based segmentation method is disclosed by 20 Farag (1992), a region based method by Hsu (1992), a Markov random field based method by Tenior (1992), and a texture based method by Ma and Manjunath (1996). Ma and Manjunath discuss a comprehensive treatment for segmentation and region merging. Ma and Manjunath""s algorithms appear powerful and effective; the feature extraction is totally texture and grid cell based. Nevertheless, the fundamental difficulty in segmentation remains to this day.
The field that deals with enhancing the visual quality of an image and extracting information of an image is called image processing, pattern recognition, remote sensing, or photogrammetry. Objects in a generic image comprising one or more layers are conventionally stored as data representing graytone and/or color pixels mixed with background pixels. Each subject can be identified by matching it against elements of a spectral library, provided with enough bands, such as that in a hyperspectral image cube. The end result of this matched filtering process is a still, pixel classification map, which shows no information other than the fact that each pixel belongs to a specific class or material type.
In the past, a spectral angle mapper of pixel based spectral matching was used to compare an observed waveform (i.e., spectral signature with elements of a spectral library). In the present invention, each pixel is treated as if it is an element of the library; therefore, there is no need to create a spectral library at all.
When a Principal Component (PC) process is used in order to detect objects in images, two major problems exist. First, a PC transform is not based on any objects. Therefore, the selected PC scenes may not contain the desired objects. Second, as documented by Schott, a PC scene is based on variance. Therefore, a low contrast band will not be exhibited, signifying a loss in the depiction of an image.
The relevant papers in xe2x80x9cStorage and Retrieval for Image and Video Databasesxe2x80x9d, published by SPIE in 1993, reveals that few of the current systems are object based. In fact, the vast majority are full scene and/or grid cell based. (NeTra, by Ma and Manjunath of UC Santa Barbara, 1997.) One of the few object based programs deals only with simple objects such as human flesh tone vs. the background. (Tao, et al., SPIE Vol. 3002 p. 340-351, 1997.) The lack of object based systems is closely tied to the inability to segment a scene reliably and meaningfully. The reliability issue is associated with the stability of partitioned regions in the scene, whereas the meaningfulness of the scene is dependent on whether the knowledge of data experts is used in the segmentation algorithm.
There is a lack of lexicon data experts in Image Content Retrieval Systems. A brief review of current image database search systems discussed in SPIE""s xe2x80x9cStorage and Retrieval of Image and Video Databases (1993-1997)xe2x80x9d reveals that most image databases use a subset of tone (color), texture and shape principles. Data experts"" knowledge using combinations of key words is simply not used in object extraction algorithms.
The present invention operates in the inventor""s IMaG system and uses a pseudo-English programming language, which includes processing and query language. The IMaG system integrates: (1) image processing, (2) multi-source analysis, and (3) GIS (Geographic Information Systems) into one single environment that possesses the tools to assist in solving smart imagery archiving problems. For example, the user can capture the knowledge of data experts and convert it to object based content retrieval algorithms; and then use these expert systems to extract the target objects in the scene, outlining and labeling each of them with appropriate text and color symbols automatically. As a result, the experts"" knowledge is used directly to create an enhanced image that contains both textual and logical representations in addition to the original physical representation.
The benefits of textual information for information retrieval is discussed by Kurakake, et al. (SPIE, Vol. 3022, 20 pp. 368-379.) In addition, similar systems, such as DocBrowse, are discussed by Jaisimba, et al. (SPIE, 5 Vol. 2670, pp. 350-361) and IRIS by Hermes, et al. (SPIE Vol. 2420, pp. 394-405) assumes that textual data is already on the images. The above mentioned systems simply assume that text describing the image is on the image without knowing whether the image directly corresponds with the text. In contrast to this invention, the textual information is inserted into the original image after a feature extraction process is completed.
In order to extract an object from an image data set, the user must perform a number of information processing steps. The most crucial step of all is programming the algorithm with a machine-understandable language. Programming with a conventional machine language such as FORTRAN, C, and/or C++, however, is tedious, time-consuming and error prone. Most languages used for retrieval purposes in conventional systems are generally known as query languages. One of the well-known languages is the Manchester Visual Query Language (Oakley, et al., SPIE, Vol. 1908, pp. 104-122.) Another language is the QBIC system, which is used for querying by color, texture and shape. (SPIE, Vol. 1908, pp. 173-187.)
Varying scenarios of image conditions and scene complexity can prevent the algorithm from finding the intended objects, despite the fact that a pseudo-English programming language can convert a concept to a computer executable program with ease. This is particularly true when the attributes of an observed, segmented object does not fit the definition of objects in the rule set. In addition, none of the existing image content retrieval systems is capable of converting a rule set of an individual data expert into a feature extracting algorithm in near real time. The rule set will not commit an error of commission under these conditions, but it would be desirable for additional object detection means to be provided as a complementary, or backup system.
In certain cases, the algorithm is virtually unprogrammable, simply because it is extremely difficult to translate a human-based concept to a machine language. Relating two objects that reside on two distinct image layers is a case in point. The present invention reflects a solution to this problem of incorporating a conventional machine language. It uses a pseudo-English programming language described in U.S. Pat. No. 5,631,970, issued on May 20, 1997, hereby incorporated by reference,.
Computer users presently use a device called a Graphic User Interface (GUI), which has been invented to lessen the burden of typing upon a computer keyboard. Using the GUI, the user communicates with a computer through a mouse, and performs essentially only two actions: xe2x80x9cpointxe2x80x9d and xe2x80x9cclickxe2x80x9d. Even the simplicity of the point and click method, however, can become a burden, if numerous point and clicks are necessary to initiate the execution of a program. The same process can be achieved by typing simple script under a UNIX operating system.
The minimalistic approach of pointing and clicking has been achieved to a certain degree in the database management field. It requires that data be organized in an orderly fashion. An attribute table is used to organize a data set in rows and columns, with each object occupying a row. On the other hand, image data is totally unsorted. Thus, the user cannot use a conventional GUI with ordinary database management software to point and click an object. For example, the user can attempt to extract a bright object, but it would be extremely difficult for the user unless the object itself were identified with a brightness attribute. Therefore, minimalism in GUI should be interpreted as an object extraction model that defines the characteristics of the target model. A minimalism based object identification and extraction process is achieved if the GUI is intelligent enough to take the target characteristics of the perceived object from the user, and convert the information into a text file based program.
In object identification and recognition, not all users prefer simple objects. At one extreme, the user may wish to compose the program unassisted and without any restrictions imposed by the models in the program. On average, users may use the intelligent GUI to assist in composing the script or rule set to answer a set of questions, without typing a text file.
Another factor that conventional content-based retrieval systems do not consider is the possible loss of the original data due to deterioration of the storage media. To preserve the segmented scenes of the original data, the inventive system performs a conversion, preserving the size, shape and even the topological information among regions/polygons. The means for converting raster segmented scenes into corresponding vectors is called a raster to vector conversion, while the reverse process is called vector to raster conversion. This vectorization process allows users to retrieve objects on a feature attribute table representation level.
Converting a segmented image into a corresponding vector representation yields two obvious benefits. First, the essential scene structure based information can be stored in a vector format for efficient storage and preservation without the loss of significant information. Second, efficient information retrieval and analysis can be realized in a Geographic Information System (GIS) environment.
The GIS based approach to image segmentation and data representation for information creation and retrieval is largely lacking in current systems. The accuracy of a feature vectorization process has long been a concern in the GIS community. For example, it was stated that xe2x80x9cRules have been established to control the conversion of raster to data vectors. However, the effects of the shape, size and spacing of polygons on the conversion process have not been vigorously explored. It is critical that research be undertaken to explore the effects of the raster-to-vector conversion process for digital remotely sensed data. Methods of quantifying the variation between vector-to-raster and raster-to-vector conversions must be developed. Only when the results from a variety of studies performing such conversions are understood and quantified can we begin to employ these techniques with some degree of confidence.xe2x80x9d (Closing Report of Research Initiative 12 of the National Center for Geographic Information and Analysis (NCGIA), p. 9, July, 1995, sponsored by the National Science Foundation.)
The object based image data can be converted into bonafide GIS layers that can be analyzed using standard GIS software such as ARC/Info or ARC/View. When information is partitioned into individual layers, each object in each layer, which is equivalent to a segmented image, is represented by its boundary contour in terms of node, arc and polygon. The entire scene is described by a feature attribute table, typically in a DBF format, that can be read by standard database management software such as dBASE.
It would be advantageous to provide a variety of means to process varying types of image data ranging from a zero band, such as a geo-position system (GPS) to n-band data, such as a hyperspectral image cube that contains hundreds of bands.
It would also be advantageous to provide a variety of strategies for object detection in such a way that an object can still be detected if a segmentor fails to perform the task.
It would also be advantageous to provide a segmentor to integrate varying object detection means.
It would also be advantageous to provide a mechanism for using a pseudo-English programming language with an intelligent GUI to identify and/or extract objects in images and/or any geo-coded data, with a varying degree of interaction between the user and the machine system.
It would also be advantageous to provide a mechanism for identifying and/or recognizing an object in any spatial data, including GPS, by browsing through a set of object recognition models and selecting one to execute the programs.
It would also be advantageous to provide a mechanism to identify and recognize an object in geo-spatial data by answering only a few questions.
It would also be advantageous to provide a system for identifying and recognizing an object in geo-spatial data by answering a set of questions as the basis for generating a complex object identification rule set script.
It would also be advantageous to provide advanced users with a system to identify and extract objects in spatial and geo-spatial data by composing a script with a standard general editor.
It would also be advantageous to provide a vectorizer/rasterizer system that treats an object vectorization as a data compression process in which the compression process is totally preserved without losing information and creating errors in location, size and shape of a polygon.
In accordance with the present invention, there is provided a process and a processing architecture for identifying and/or extracting an object from spatial and geo-spatial data, such as images, maps and GPS-generated data. The system allows an operator, using an intelligent graphic user interface (GUI), to communicate with the processing system, using varying levels of interaction.
First, a totally automated mode allows an operator to extract an object by browsing through a set of j models, and then to select an appropriate one to run the program. Second, an expert modeler allows an operator to answer a few questions with which the system can generate a bonafide rule set script. Third, an expert editor allows an operator to answer a set of questions as the basis for generating a complex object identification and/or extraction program without a general editor. Fourth, the advanced user is able to use the general editor to compose a rule set which identifies the object.
An appropriate programming language in which to compose a rule set to accomplish the object identification comprises a human-like, pseudo-English language, which uses a lexicon of human photo-interpreters. Key words such as tone, texture, size, elongation, convolution, within, touches, approaches, etc., are part of this lexicon. This approach is presented in the aforementioned U.S. Pat. No. 5,631,970. The pseudo-English programming language simplifies the programming process for extracting objects with image data, but requires composing and typing using a general editor. Composing the rule set and typing the text file are still two steps away from the user and the computer. From the vantage point of the user, it would be useful to obtain a target object without being required to make a keystroke. The user should have to make only a minimum number of keystrokes, if keystrokes are needed at all.
For example, the user can first set the rule based object extraction process as a full scene based segmentation analysis. Then the first complementary object detection process can be based on a pixel based analysis, using a bright pixel detector in conjunction with a density based post processor.
The present invention uses a hyperspectral texture analysis, which is a bonafide image normalization process, and a feature enhancement process. It is a data compression process that does not lose any information, since none of the original information is discarded in the process. Three searching windows, a compressed image cube of hundreds of bands, can yield three transformations.
The proposed hyperspectral texture transform does not lose the desired objects or information about the objects during the transform. In fact, in this invention, the textual transformation creates additional information for the object simply by adding a spatial dimension to the original data set.
The present invention has a complementary approach to solving the problem of object detection by use of segmentation. A grid system is used to partition a scene into a set of grid cells or tiles, and then the likelihood that a man-made object is imbedded in a natural environment is determined. Once an area of interest (AOI) is determined, the user can proceed to extract and identify the object imbedded in the AOI by using a segmentor-based and/or a matcher-based classifier. A possibility for the grid cell based object detection system can is illustrated in U.S. Pat. No. 5,341,439, issued to the present inventor, hereby incorporated by reference.
In addition, the present invention uses a mesotexture based matrix for object detection. The size of an object searching grid, such as 64xc3x9764 or 96xc3x9796, is defined by the feature extraction algorithm combined with the extraction of the mesotexture measure for each cell. The mesotexture based matrix is normalized using a ratio based operation with which the first element of a row or column is set to 1, the second element is divided by the first element, the third element is divided by the second element, and so on. The result of this pair-wise ratio based operation is a set of matrices: 1) a row-direction based transform, and 2) a column-direction based transform. Next, a set of mesotexture based statistics is devised to rank the cells in terms of the likelihood of the existence of a man-made object in a natural environment. One of the cell ranking criteria can be based on the magnitude of the normalized mesotexture values.
The present invention uses an IMaG ? system to perform a raster to vector conversion used to preserve the size, shape and even the topological information among regions/polygons. The vectorization process allows the user to retrieve objects on a feature attribute table representation level. The pseudo-English programming language of the IMaG system is a dual functioning language which simultaneously processes and queries. The user can extract any object that meets the user""s criteria.
The present invention integrates a vectorizer and a rasterizer with the main processor. A base map is a segmented raster image, and the digital value of the interior pixels of each region corresponds to the region ID code. Each feature of a scene is subjected to a vectorization process performed by using a totally automated process by using the region ID code raster image as the input. The aforementioned vectorization process is totally reversible. Therefore, possible errors in size, shape and location can be detected. By using two feature attribute tables, one generated with the original image and the other generated by using the rasterized image with the vector data as the input, errors are detected. In addition, an expert""s knowledge can be captured and then the newly-created intelligence can be inserted back to the original database, yielding an enhanced database.
The system is also designed for multi-band data, integrating image processing, multisource analysis and GIS into one single system. In the GIS community, information is partitioned into individual layers. Each object in each layer is equivalent to a segmented image which is represented by its boundary contour in terms of node, arc and polygon. The whole scene is described by a feature attribute table. The feature attribute table is typically in a DBF format that can be read by standard database management software, such as dBASE.