1. Field of the Invention
This invention relates generally to the field of verifying queries which may be applied to multimedia databases, hypermedia databases and image retrieval systems on the World Wide Web. The issuer of the query is then allowed to adjust the elements of the query based upon feedback provided to the issuer as a result of the verification.
2. Description of the Related Art
The contents of traditional database systems are precise. Hence, queries applied to such databases produce deterministic answers. Query processing in multimedia databases, for example, is different from query processing in traditional database systems. For example, image retrieval in multimedia databases requires a more comprehensive consideration of various types of visual characteristics, such as color and shape. In general, such visual characteristics cannot be specified completely. As a result, image matching in multimedia databases is based on similarity, rather than equality. Since image retrieval requires similarity based on both keyword (i.e. semantics) and visual characteristic comparisons, multimedia database query processing should be viewed as a combination of (1) information retrieval notions, described in Salton, "Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer," Addison-Wesley Publishing Company, Inc., 1989 (relevance feedback, inexact match, word mismatch, query expansion); (2) Object Relational Database Management System (ORDBMS) or Object Oriented Database Management System (OODBMS) database notions (recognition of specific concepts, variety of data types, spatial relationships between objects); and (3) computer-human interaction (CHI) notions (interactions with the multimedia databases to perpetually reformulate queries for honing in on target images).
Relevant prior systems will now be discussed. The Virage Image Retrieval Engine is a system for image retrieval based on visual features, including color, shape, or texture and other domain specific features. See J. R. Bach et al., "The Virage Image Search Engine: An Open Framework for Image Management," Proceedings of the SPIE--The International Society for Optical Engineering: Storage and Retrieval for Still Image and Video Databases IV, February 1996. Virage has a query language based on structured query language (SQL), extended by user-defined data types and functions. Query By Image Content (QBIC), developed at the International Business Machines Corporation (IBM), is another system that supports image retrieval using visual examples. See Flickner et al., "Query by Image and Video Content: The QBIC System," IEEE Computer, 28(9):23-32, September 1995. Virage and QBIC both support image matching and keyword-based retrieval functionality on the whole image level. However, neither of them provides semantics-based access to objects in images. Another problem with both the QBIC system and the Virage system relates to reformulation granularity. Both QBIC and Virage allow users to reformulate queries only by selecting an entire image from the result. Thus, the reformulation granularity is an entire image. These systems fail to provide a scheme which supports a finer granularity of reformulation, e.g. using objects within an image.
Garlic (Roth et al., "The Garlic Project," Proceedings of the 1996 ACM SIGMOD Conference, May 1996) and PESTO (Carey et al., "PESTO: An Integrated Query/Browser for Object Databases," Proceedings of the 1996 VLDB Conference, September 1996) are two other projects at IBM related to multimedia, which focus on integrating and browsing/querying images in heterogeneous and distributed information sources respectively. PESTO allow users to specify tables and attributes for join/project and other aggregation functionality for multimedia information retrieval.
SCORE is a similarity-based image retrieval system developed at University of Illinois at Chicago. See Alp, et al., "Design, Implementation and Evaluation of SCORE, Proceedings of the 11th International Conference on Data Engineering, March 1995, IEEE; Prasad, et al., "Similarity based Retrieval of Pictures Using Indices on Spatial Relationships," Proceedings of the 1995 VLDB Conference, Sep. 23-25 1995. This work focuses on the use of a refined entity relation (E-R) model to represent the contents of pictures and the calculation of similarity values between E-R representations of images stored and query specifications. However, SCORE does not support image matching.
VisualSeek is a content-based image query system developed at Columbia University. See Smith et al., "VisualSeek: A Fully Automated Content-based Image Query System," Proceedings of the 1996 ACM Multimedia Conference, pages 87-98, 1996. VisualSeek uses color distributions to retrieve images. Although VisualSeek is not object-based, it provides region-based image retrieval. Users can specify how color regions shall be placed with respect to each other. VisualSeek also provides image comparisons and sketches for image retrieval. However, VisualSeek is designed for image matching, and does not support retrieval based on semantics at either the image level or the object level.
MQL is a multimedia query language. See Kau et al., "MQL--A Query Language for Multimedia Databases," Proceedings of 1994 ACM Multimedia Conference, pages 511-516, 1994. The syntax of the MQL is select &lt;A&gt;&lt;V&gt; from &lt;R&gt; where &lt;C&gt;, in which &lt;A&gt; is a list of attributes to be retrieved, &lt;V&gt; is the result of version, &lt;R&gt; is the domain class, and &lt;C&gt; is a condition. Kau et al. claim MQL can support complex object queries, version queries, and nested queries (e.g. IN). MQL also supports a "contain" predicate through pattern matching on images, voice, or text. However, additional parameters to control the relaxation of query processing are not provided.
Classification has long been used to increase the efficiency of searches on large scale databases by filtering out unrelated information. Hirata et al., "The Concept of Media-based Navigation and its Implementation on Hypermedia System `Miyabi`," NEC Research & Development, 35(4):410-420, October 1994 focuses on color information. Hirata et al. extract color values from images and map them onto HLS color spaces. Based on the resulting clusters, users can access image directories or filter out images for searching.
Del Bimbo et al., "Shape Indexing by Structural Properties," Proceedings of the 1997 IEEE Multimedia Computing and Systems Conference, pages 370-378, June 1997 focuses on clustering by shape similarity. Based-on the multi-scale analysis, Del Bimbo et al. extract the hierarchical structure of shape. According to this hierarchical structure, Del Bimbo et al. attempt to provide more effective search capabilities. However, this method is based on boundary analysis and assumes that boundaries are extracted correctly. Images from the Web usually include many elements and, thus it is hard to determine the boundaries of primary objects.
QBIC clusters images using feature vectors. Carson et al., "Color and texture-based image segmentation using em and its application to image querying and classification," Submitted to IEEE Transaction on Pattern Analysis and Machine Intelligence, 1997, extract objects from images based on the color and texture. Using the combination of extracted objects and their attributes (top two colors and texture) Carson et al. try to categorize the images into several groups. In this work, shape and positional information are not considered.
Several problems exist with the prior systems. First, many existing systems, e.g. QBIC and Virage, support query reformulation by providing query results as feedback for users to select one of the candidate images as the new query. Based on experiments on a database of 1,000 images, an average of 25 query reformulations is required using this method. The query model introduced in Li et al., "Facilitating Multimedia Database Exploration through Visual Interfaces and Perpetual Query Reformulations," Proceedings of the 23rd International Conference on Very Large Data Bases, pages 538-547, August 1997, allows users to reformulate queries based on system feedback of three different types of criteria. However, the feedback is based only on semantics and is not based on visual characteristics. This reference to Li et al., and each of the references discussed throughout, are hereby incorporated by reference herein. In Li et al., the system feedback is generated through expensive query processing. Since the users may reformulate queries perpetually for honing in on target results, such frequent query processing may put heavy loads onto the system.
Second, coming up with too many or too few matches is not satisfactory. In a typical image retrieval process, if there are too many matching images, the user tightens, or refines, the query criteria. If there are too little or no matches, the user relaxes the query criteria. The reformulation proceeds until the number of candidate images is within an acceptable range so that the user can browse through the candidate images.
Third, there is often a lack of heterogeneity between a user's terminology and the semantics specified in the multimedia databases. The problem of word mismatches arises because the vocabulary employed by the database authors and users may be different. This problem further expands in image retrieval since there may be a mismatch in both visual characteristics and terminology. For example, a user may want to retrieve images containing a car. However, the database may only contain the semantics "transportation," "automobile," and "truck." In this case, the system needs to relate the images in the database with the user's query based on semantics association during the query specification phase or in the query processing phase.