Recent methods for retrieving images and videos by content from large archives utilize feature descriptors and feature comparison metrics in order to index the visual information. Examples of such content-based retrieval systems include the IBM Query by Image Content (QBIC) system, detailed in “Query by image and video content: The {QBIC} system,” by M. Flickner, et al, “IEEE Computer”, 28(9):23-32, (September 1995); the Virage visual information retrieval system, detailed in “Virage image search engine: an open framework for image management,” by J. R. Bach, et al, “Symposium on Electronic Imaging: Science and Technology—Storage & Retrieval for Image and Video Databases {IV}”, volume 2670, pages 76-87, (1996); the MIT Photobook, detailed in “Tools for content-based manipulation of image databases,” by A. Pentland, et al, “Proceedings of the SPIE Storage and Retrieval Image and Video Databases II” (February 1994); the Alexandria project at UCSB detailed by B. S. Manjunath and W. Y. Ma, in “Texture features for browsing and retrieval of image data,” in the “IEEE Trans. Pattern Analysis Machine Intell. Special Issue on Digital Libraries”, Vol. 8 (1996), and by M. Beatty and B. S. Manjunath in “Dimensionality reduction using multidimensional scaling for image search,” published in the “Proc. IEEE International Conference on Image Processing” (October 1997); and, the IBM/NASA Satellite Image Retrieval System, detailed in “Progressive content-based retrieval from distributed image/video databases,” by V. Castelli, et al, “Proceeding of the International Symposium of Circuit and System” (1997).
In the prior art systems, the feature comparison between the search target and those feature vectors stored in the database is typically based upon a simple fixed metric, such as the Euclidean distance or the quadratic distance (see: “Visualseek: A fully automated content-based image query system,” by J. R. Smith and S. F. Chang in the “Proc. International Conference on Image Processing” (1996). While these simple metrics may minimize the computational requirements for feature comparison, they typically do not correspond well to human perceptual distance nor do they have the capabilities to adapt to the changing environment commonly arising in various scientific applications. Specifically, it is desirable to provide a system and method that can accommodate the following diverse requirements:                Retrieving Synthetic Aperture Radar (SAR) Satellite images and identifying regions in the images with texture (e.g., ice) type similar to the search target;        Retrieving one-meter resolution satellite images and identifying regions in the images with spectral features (e.g., crop type) similar to the search target;        Retrieving LANDSAT Thematic Mapper™ satellite images and identifying regions in the images with a combination of spectral and texture features (e.g., indicative of similar terrain type) which are similar to the search target.The foregoing feature comparisons may be implemented for the following applications:        Environmental epidemiology: wherein the system seeks to retrieve locations of houses which are vulnerable to epidemic diseases such as Hantavirus and Denge fever based on a combination of environmental factors (e.g. isolated houses that are near bushes or wetlands), and weather patterns (e.g. a wet summer followed by a dry summer);        Precision farming: wherein it is desirable to (1) retrieve locations of crop developments that are exposed to diseases, (for example: clubroot, which is a soil-borne disease that infects cauliflower crop). Cauliflower and clubroot have recognized spectral signatures, and exposure results from their spatial and temporal proximity; (2) retrieve those fields which have abnormal irrigation; or, (3) retrieve those regions which have higher than normal soil temperature;        Precision forestry: wherein the system may seek to (1) calculate areas of forests that have been damaged by hurricane, forest fire, or storms, and (2) estimate the amount of the yield of a particular forest;        Petroleum exploration: to retrieve those regions which exemplify specific characteristics in the collection of seismic data, core images, and other sensory data;        Insurance: for which a system may be called upon to (1) retrieve those regions which may require immediate attention due to natural disasters such as earthquake, forest fire, hurricane, and tornadoes; or (2) retrieve those regions have higher than normal claim rate (or amount) that are correlated to the geography—close to coastal regions, close to mountains, in high crime rate regions, etc.;        Medical image diagnosis: for retrieval of all MRI images of brains that have tumors located within the hypothalamus, where such tumors are characterized by shape and texture, and the hypothalamus is characterized by shape and spatial location within the brain;        Real estate marketing: wherein the system may be required to retrieve all houses that are near a lake (color and texture), have a wooded yard (texture) and are within 100 miles of skiing (mountains are also given by texture); and        Interior design: for use in retrieving all images of patterned carpets which consist of a specific spatial arrangement of color and texture primitives.        
There are some fundamental problems that are related to similarity search tasks, including the following:
(1) Different similarity measures capture different aspects of perceptual similarity between images. When similarity retrieval is used for applications such as environmental epidimeology or medical diagnosis, each task might impose a different similarity measure. Ultimately, similarity measure is tied to the tasks it needs to perform.
(2) Different features have unequal contribution to the relevance of the computations of the similarity measure. When two images are regarded as similar, they ma be similar in terms of one of more of the subsets of the features. Incomplete specification of the query-as the similarity queries are usually difficult to completely specify due to the limitations of the user interface. Consequently, the system needs to interact with the user in order to learn from the user what the user really means.
In order to improve the results from feature comparison, the aforementioned VisualSEEk project at Columbia University and the Alexandria project have developed linear transformations of texture feature spaces. These approaches use fixed transformations of the feature space based on a linear transformation determined by a predefined training set. Noted that both of the database and the query will have to go through the same transformation. Unfortunately, this approach does not allow for adaptation of incremental user feedback, and thus cannot take advantage of the additional query-specific knowledge which can be provided by the user.
Research on relevance feedback for textual database systems has focused on adaptively refining a user's initial query to more accurately select the desired data. This is usually an iterative refinement process in which the user indicates the relevance or irrelevance of the retrieved items. While relevance feedback has been extensively studied in textual retrieval, there has been little investigation of its use for image retrieval. An approach towards applying this technique to an image database had been proposed (see: “Interactive Learning through a Society of Models”, by T. P. Minka, et al, Technical report 349, MIT Media Lab, 1995). In the approach taught therein, the system forms disjunctions between initial image features groupings according to both positive and negative feedback given by the users. A consistent grouping is found when the features are located within the positive examples. A different approach was later proposed by one of the authors, leading to the development of PicHunter as detailed by I. J. Cox, et al, in “Pichunter: Bayesian relevance feedback for image retrieval,” published in the “Proceeding of the International Conference on Pattern Recognition”, pages 361-369. {IEEE}, 1996 and in “An Optimized Interactive Strategy for Bayesian Relevance Feedback,” SPIE Photonics West, San Jose, 1998.) In PicHunter, the history of user selection is used to construct system's estimate of the user's goal image. A Bayesian learning system based on a probabilistic model of the user behavior is combined with user selection to estimate the probability of each image in the database. Instead of revising the queries, PicHunter tries to refine the answers in reply to user feedback. Alternatively, an approach has also been proposed to learn both feature relevance and similarity measure simultaneously. (see: “Learning Feature Relevance and Similarity Metrics in Image Databases,” by Shanu, et al, “IEEE Computer Vision and Pattern Recognition,” June 1998.) In this approach, the local feature relevance is computed from a least-square estimate, and a connectionist reinforcement learning approach has been adopted to iteratively refine the weights.
All of these methods definitely provide some improvement in image retrieval performance. However, there is a lack of a flexible framework that can allow the incremental revision of the query (for a given context), the similarity measure, the relevance of individual features, and the entire feature space simultaneously. Furthermore, these systems do not take into account the simultaneous efficient indexing in a high-dimensional feature space, which is commonly required for the above-mentioned applications. Multidimensional indexing is fundamental to spatial databases, which are widely applicable to Geographic Information Systems (GIS), Online Analytical Processing (OLAP) for decision support using a large data warehouse, and multimedia databases where high-dimensional feature vectors are extracted from the image and video data.
Multidimensional indexes can be used to answer different types of queries, including:                finding record(s) with specified values of the indexed columns (exact search);        finding record(s) that are within [a1 . . . a2], [b1 . . . b2], . . . , [z1 . . . z2] where a, b and z represent different dimensions (range search); and        finding the k most similar records to a user-specified template or example (k-nearest neighbor search).        
During the execution of a database query, the database search program accesses part of the stored data and part of the indexing structure; with the amount of data accessed depending upon the type of query and upon the data provided by the user, as well as upon the efficiency of the indexing algorithm. Generally, large databases are configured such that the data and at least part of the indexing structure reside on the larger, slower and cheaper part of the memory hierarchy of the computer system, usually consisting of one or more hard disks. During the search process, part of the data and part of the indexing structure are loaded in the faster parts of the memory hierarchy, such as the main memory and one or more levels of cache memory. The faster parts of the memory hierarchy are generally more expensive and thus comprise a smaller percentage of the storage capacity of the memory hierarchy A program that uses instructions and data that can be completely loaded into one or more levels of cache memory is faster and more efficient than a process that in addition uses instructions and data that reside in the main memory, which in turn is faster than a program that also uses instruction and data that reside on the hard disks. Technological limitations are such that the cost of cache and main memory makes it too expensive to build computer systems with enough main memory or cache to completely contain large databases.
Thus, there is a need for an improved technique for indexing image and other nonstructure data, which technique generates indexes of such size that most or all of the index can reside in main memory at any time; and that limits the amount of data to be transferred from the disk to the main memory during the search process.
Several well known spatial indexing techniques, such as R-trees can be used for range and nearest neighbor queries. Descriptions of R-trees can be found, for example, in “R-trees: A Dynamic index structure for spatial searching,” by A. Guttman, “ACM SIGMOD Conf. on Management of Data”, Boston, Mass. (June, 1994). The efficiency of these techniques, however, deteriorates rapidly as the number of dimensions of the feature space grows, since the search space becomes increasingly sparse. For instance, it is known that methods such as R-Trees are not useful when the number of dimensions is larger than 8, where the usefulness criterion is the time to complete an indexed search request compared to the time required by a simple strategy that would complete the request by sequentially analyzing every record in the database. The inefficiency of the usual indexing techniques in high dimensional spaces is the consequence of a well-known phenomenon called the “curse of dimensionality,” which is described, for instance, in “From Statistics to Neural Networks,” NATO ASI Series, vol. 136, Springer-Verlag, 1994, by V. Cherkassky, J. H. Friedman, and H. Wechsles. The relevant consequence of the curse of dimensionality is that clustering the index space into hypercubes is an inefficient method for feature spaces with a higher number of dimensions.
Because of the inefficiency associated with using existing spatial indexing techniques for indexing a high-dimensional feature space, techniques well known in the art exist to reduce the number of dimensions of a feature space. For example, the dimensionality can be reduced either by variable subset selection (also called feature selection) or by singular value decomposition followed by variable subset selection, as taught, for instance by C. T. Chen, Linear System Theory and Design, Holt, Rinehart and Winston, Appendix E, 1984. Variable subset selection is a well known and active field of study in statistics, and numerous methodologies have been proposed (see e.g., “An Optimal Selection of Regression Variables” by Shibata, et al, “Biometrika”, vol. 68, No. 1, pp. 45-54, 1981). These methods are effective in an index generation system only if many of the variables (columns in the database) are highly correlated. Such a correlation assumption, however, cannot generally be made for real world databases. Consequently, there is a need for techniques that allows incremental revision of the user query or feature space based on the user feedback, while supporting efficient indexing in a high dimensional feature space.