1. Field of the Invention
The present invention is related generally to a data processing system and in particular to a method and apparatus for information management. More particularly, the present invention is directed to a computer implemented method, apparatus, and computer usable program code for cross-modal cross-vocabulary mapping and application for more effective multimedia content annotation, management, access, presentation, or distribution.
2. Description of the Related Art
Text search and retrieval analysis is a richly exploited area for handling text queries. A query is a request for information from a database or other storage device. A text query is a request for textual information from a database or other storage device containing documents or other textual information. For example, document retrieval is a method of matching a text query against a set of text documents or records to locate text information desired by a user.
Search and retrieval are also vital parts of multimedia content management and are receiving increasing amounts of attention with the growing use of multimedia libraries. Multimedia refers to resources that span multiple modalities, including, but not limited to, audio, video, and text modalities. For example, a resource could include video images of a basketball game and a sound file containing the audio corresponding to the video images.
While the problems associated with text retrieval and text queries are well understood problems with a wide range of effect solutions, search and retrieval of multimedia resources that include a combination of audio, video, and/or text modalities have not been explored to the same degree. A major challenge for multimedia management systems is the gap between the way multimedia content is stored or represented in a computer system and the way users search for it. For example, digital images are typically stored as pixels and are sometimes associated with, or represented by, low-level visual features, such as colors, textures, shapes, etc. While this may enable searching of images by visual similarity to other example images, users are typically more interested in searching by textual keywords or semantic concepts of interest.
In some domains, such as the World Wide Web, images have annotations. Annotations are textual information or metadata associated with an image or other information, such as, without limitation, a title, author, date, and/or description, which enables textual searching of visual content. However, in many other domains, such textual information is not available or is very limited. The limited availability of annotations makes it difficult, if not impossible, for a user to search for images using textual keywords.
Successful multimedia systems require approaches to retrieval of non-textual information, as well as effective fusion of information from different modalities. However, currently available text query and text retrieval methods do not provide a capability to query multimodal documents that have associated text, as well as rich, unstructured multimedia data.