1. Field of the Invention
The present invention relates generally to information retrieval systems, and more particularly, to information retrieval systems for retrieval of multimedia information.
2. Background of the Invention
Current computer systems enable users to create complex documents that combine both text and images in an integrated whole. In addition, computer users can now insert digitally recorded audio or video directly into such documents, to create rich multimedia documents. In these documents the image, audio, or video components are either directly embedded into the data of the document at specific positions, or are stored external to the document and referenced with referencing data. An example of the former construction is Rich Text Format (RTF) documents, which embed image data directly into the document. An example of the latter construction are HyperText Markup Language documents which use references to external image, audio, or video files to construct the document, where references have specific locations in the text data. Generally, documents in which two or more different types of multimedia components are embedded or referenced are herein termed "compound documents."
Separately, both text and image retrieval databases are known, and generally operate independently of each other. Text retrieval systems are designed to index documents based on their text data, and to retrieve text documents in response to text-only queries. Image retrieval systems are designed to index images based either on image characteristics directly (e.g. color, texture, and the like), or textual keywords provided by users which describe or categorize the image (e.g. "sunset", "blue sky", "red"), and to retrieve images in response to query containing one or more of these items of data. In particular, the images exist independently in the database from any compound document, and the keyword labels typically form merely another column or data attribute of the image, but do not come from text of a compound document itself. Further, the index of images also exists independently in the database from the text or column indexes. There is no single index that considers the whole of the compound document and all of its multimedia components. For example, a conventional, relational multimedia database might use an image table with columns for image ID, descriptive text string, image data, and category label(s). A user would then request an image by specifying some text keywords or category labels which are processed into a query such as:
SELECT ID
FROM image table PA2 WHERE TEXT LIKE "sunrise" PA2 AND IMAGE LIKE "IMAGE ID FOO" PA2 AND CATEGORY "HISTORY"
Matching on the "image like" operator would then use some type of image data comparison (e.g. matching of color histograms) which is already indexed into the database, along with conventional text matching. However, the result is still merely the retrieval of matching images, not compound documents containing images (let alone other types of multimedia data). An example of an image retrieval system that merely retrieves images in response to image characteristics or text labels is U.S. Pat. No. 5,579,471 issued to IBM for their "QBIC" image retrieval system.
Another limitation of conventional systems is that they do not expand a user's query with multiple different types of multimedia data which is then subsequently to retrieve matching documents. For example, current systems do not take a user's text query, add image data (or portions thereof, e.g. a color histogram) to the query, and then search for documents, including text and images, that satisfy the expanded query.
Accordingly, it is desirable to provide a system, method, and software product that retrieves compound documents in response to queries that include various multimedia elements in a structured form, including text, image features, audio, or video.