In recent years applications such as the Internet have allowed for dramatic changes in the way information, such as audio visual (AV) multimedia content, may be accessed and worked with. Examples which are suggested are allowing user interaction with audio visual (AV) multimedia content, as well as searches and queries based on the AV content. Further examples include allowing a user to request information (such as restaurants in the user's location) or to change the viewpoint from which a video scene is viewed.
To facilitate such processing, the AV content may have an associated description of the content. One technique to describe such AV content is presented in the MPEG-7 standard otherwise known as “Multimedia Content Description Interface”, an ISO/IEC standard developed by the Moving Picture Experts Group (MPEG). At the lowest level of description, the MPEG-7 standard may use ‘Descriptors’ which represent low-level AV features, such as signal amplitude, frequency, and spectral tilt. A schema language, such as the Data Description Language (DDL) may be used to represent the results of modeling audiovisual data. The DDL specifies the syntax for creating Description Schemes from the Descriptors. The description of the AV information may be accomplished with a hierarchical tree structure or a graph structure, depending on the nature of the information.
Another conventional technique of describing audiovisual multimedia content are key frames. Key frames are useful for creating summaries of video, for allowing near random access to video, for comparing video, and for referring to video in print. However, most implementations of key frames are static, consisting of a single list of the key frames related to some quantization of the video sequence, such as a sequential segmentation into shots or scenes.
Unfortunately, for a very large piece of audiovisual content, re-sending descriptions of the content (or key frames) for a fine grained quantization, such as individual shots may be unfeasible due to bandwidth limitations. For example, if the average shot length in a two hour feature film length movie is ten seconds, the required set of shots would be 6000 key frames. Assuming a transmission rate of 30 frames per second, it would take 3.3 minutes to re-send the key frames. Unfortunately, this is not within an interactive response range.