Multi-media objects carry a great deal of information and as multi-media technology is growing, there has been an increasing demand for a system that allows users to easily describe, archive, search and retrieve these multi-media objects. Some conventional methods and their limitations are described as follows.
In the past, people have used shoe boxes, albums and the like to archive images and then search and retrieval of these images is performed based on the user's memory. Stock agencies have used index cards to keep track of stock images and search and retrieval is done using personnel experiences and preferences. Such methods of archiving and retrieving images are difficult, time-consuming and expensive. These methods are also subjective in nature.
As computers became popular and more images were stored on-line, a keyword based approach was developed. Keyword representations can be created either manually or automatically. In the manual approach, a set of keywords are assigned to each image in the database. The keywords describe the image content of interest (i.e. objects, events, concepts, place, activities, etc.) The KODAK PICTURE EXCHANGE (KPX) uses this approach. A shortcoming of this approach is that a multi-media object, in this instance images, can not always be described by a disjoint set of keywords. This method of image retrieval depends on an exact match of a keyword used in the description and in the search, and the keywords used to describe/retrieve an image may change from user to user. Some incremental improvements can be made to this method by use of a thesaurus.
In the automatic approach, keywords are selected from within the document itself based on statistics pertaining to the relative frequency of word occurrence. This approach is more suitable for document retrieval applications where a large amount of text is available to obtain accurate statistics, such as in the area of newspaper article retrieval. Many text retrieval engines have been developed using this approach. However, in the case of images, the caption will typically be a sentence or two, which is not enough to extract meaningful statistics. Another limitation of the keyword-based technique for image retrieval is that only the words, and not the meaning or context, are taken into account. This makes this technique unsuitable for applications that contain a sparse amount of text to describe an image.
Images also can be searched and retrieved using image content analysis techniques. Image content attributes are defined using color, texture, shape and the like. Some of the existing systems that perform image content analysis are QBIC from IBM, and Virage from Virage Corporation. The drawback of this approach is it only allows for image similarity type search and retrieval, such as responding to queries of the form "Find me images like this one . . . ".
The University of Buffalo has developed a system, PICTION, which uses natural language captions to label human faces in an accompanying newspaper photograph. A key component of the system is the utilization of spatial and characteristic constraints (derived from captions) in labeling face candidates (generated by a face locator). The system is limited to only identifying faces based upon the spatial constraints defined in the caption, for example "John Doe is to the left of Jane Doe . . . ".
Anil Chakravarthy at MIT has developed a program as part of his thesis "Information Access and Retrieval with Semantic Background Knowledge" for retrieving captions of pictures and video clips using natural language queries. This thesis presents a limited framework for structured representation through the incorporation of semantic knowledge. However, the program only accepts images accompanied by well-formed single sentence description. Queries also need to be well-formed single sentence descriptions.
U.S. Pat. No. 5,493,677 discloses a natural language archival and retrieval system for images. This patent discloses inputting a search query in a natural language and then searching for archived images. It identifies name, location and noun phrases from the query; other words are eliminated. For example, prepositions are not used for further processing. This eliminates the context of some sentences and may give inaccurate results during retrieval, for example, the difference between the two phrases, "A man on a horse." and "A man and a horse." In addition, when inputting information that is to be associated with an image into the database, it has to be specified in a standardized form. The user is involved for part-of-speech disambiguation and word-sense disambiguation. This is time consuming and labor intensive.
Consequently, a need exists for a smart retrieval system to eliminate the above-described drawbacks.