The Motion Picture Expert Group (MPEG) develops standards concerning audiovisual content. One component of the MPEG standard scheme includes MPEG-7 standards which are directed to providing descriptions of audiovisual content that may be of interest to the user. Specifically, the MPEG-7 standards are developed to standardize information describing the audiovisual content. The MPEG-7 standards may be used in various areas, including storage and retrieval of audiovisual items from databases, broadcast media selection, tele-shopping, multimedia presentations, personalized news service on the Internet, etc.
According to MPEG-7 standards, descriptions of audiovisual content consist of descriptors and description schemes. Descriptors represent features of audiovisual content and define the syntax and the semantics of each feature representation. Description schemes (DS) specify the structure and semantics of the relationships between their components. These components may be both descriptors and description schemes. Conceptual aspects of a description scheme can be organized in a tree or in a graph. The graph structure is defined by a set of nodes that represent elements of a description scheme and a set of edges that specify the relationship between the nodes.
Descriptions (i.e., descriptors and DSs) of audiovisual content are divided into segment descriptions and semantic descriptions. Segment descriptions describe the audiovisual content from the viewpoint of its structure. That is, the descriptions are structured around segments which represent physical spatial, temporal or spatio-temporal components of the audiovisual content. Each segment may be described by signal-based features (color, texture, shape, motion, audio features, etc.) and some elementary semantic information.
Semantic descriptions describe the audiovisual content from the conceptual viewpoints, i.e., the semantic descriptions describe the actual meaning of the audiovisual content rather then its structure. The segment descriptions and semantic descriptions are related by a set of links, which allows the audiovisual content to be described on the basis of both content structure and semantics together. The links relate different semantic concepts to the instances within the audiovisual content described by the segment descriptions.
Current semantic descriptions are limited in their descriptive capabilities because they describe specific semantic entities without identifying the relationships between these specific semantic entities and other related semantic entities. For instance, the current model of a semantic description includes multiple DSes for various semantic entities such as, for example, an event, an object, a state, an abstract concept, etc. An event DS describes a meaningful temporal localization. For example, an event DS may be associated with a concrete instance in the real world or the media (e.g., a wedding). An object DS describes semantically a specific object (e.g., a car depicted in an image). A state DS identifies semantic properties of the entity (e.g., of an object or event) at a given time, in a given spatial location, or in a given media location. A concept DS describes abstract elements that are not created by abstraction from concrete objects and events. Concepts such as freedom or mystery are typical examples of entities described by concept descriptions.
The above DSes describe specific entities. However, a description cannot be complete if it only describes an individual entity by itself. Most human description and communication is accomplished by bringing information together, information is seldom completely delineated in any exchange. Hints are present in speech that cause both parties to construct reasonably compatible or similar mental models, and the information discussed is discussed within such context. Accordingly, a description cannot accurately and completely describe the content unless it contains various additional information related to this content. This additional information may include background information, context information, information identifying relationships between the content being described and other entities, etc.
In addition, no current mechanism exists for creating descriptions of metaphors or analogies. A traditional opinion is that semantic descriptions should only describe audiovisual material and, therefore, there is no need to create metaphorical descriptions. However, humans use metaphors and analogies all the time without realization of such use. Such metaphors and analogies as “feeling like a fish out of water,” “getting close to the deadline,” “flying like a bird,” etc. are inherent in human communication. Thus, it would be undesirable to exclude descriptions of metaphors and analogies from a list of possible descriptions.
Further, current semantic descriptions are static. When the material described by an existing semantic description changes, the process of creating a description must be performed anew to produce a new semantic description describing the changed material.
Accordingly, a tool is required to create semantic descriptions that are capable of completely and accurately describe any semantic situation, audiovisual or otherwise. Such a tool should also be able to create descriptions that would dynamically reflect changes in the material being described.