1. Field of Invention
The present invention generally relates to audiovisual data representation. More particularly, this invention relates to integrating the descriptions of multiple categories of audiovisual content to allow such content to be searched or browsed with ease in digital libraries, Internet web sites and broadcast media, for example.
2. Description of Related Art
More and more audiovisual information is becoming available from many sources around the world. Such information may be represented by various forms of media, such as still pictures, video, graphics, 3D models, audio and speech. In general, audiovisual information plays an important role in our society, be it recorded in such media as film or magnetic tape or originating, in real time, from some audio or visual sensors, be it analogue or, increasingly, digital.
While audio and visual information used to be consumed directly by the human being, computational systems are increasingly creating, exchanging, retrieving and re-processing this audiovisual information. Such is the case for image understanding, e.g., surveillance, intelligent vision, smart cameras, etc., media conversion, e.g., speech to text, picture to speech, speech to picture, etc., information retrieval, e.g., quickly and efficiently searching for various types of multimedia documents of interest to the user, and filtering to receive only those multimedia data items which satisfy the user's preferences in a stream of audiovisual content.
For example, a code in a television program triggers a suitably programmed VCR to record that program, or an image sensor triggers an alarm when a certain visual event happens. Automatic transcoding may be performed based on a string of characters or audible information or a search may be performed in a stream of audio or video data. In all these examples, the audiovisual information has been suitably “encoded” to enable a device or a computer code to take some action.
In the infancy of web-based information communication and access systems, information is routinely transferred, searched, retrieved and processed. Presently, much of the information is predominantly represented in text form. This text-based information is accessed using text-based search algorithms.
However, as web-based systems and multimedia technology continue to improve, more and more information is becoming available in a form other than text, for instance as images, graphics, speech, animation, video, audio and movies. As the volume of such information is increasing at a rapid rate it is becoming important to be easily to be able to search and retrieve a specific piece of information of interest. It is often difficult to search for such information by text-only search. Thus the increased presence of multimedia information and the need to be able to find the required portions of it in an easy and reliable manner, irrespective of the search engines employed, has spurred on the drive for a standard for accessing such information.
The Moving Pictures Expert Group (MPEG) is a working group under the International Standards Organization/International Electrotechnical Commission in charge of the development of international standards for compression, decompression, processing and coded representation of video data, audio data and their combination. MPEG previously developed the MPEG-1, MPEG-2 and MPEG-4 standards, and is presently developing the MPEG-7 standard, formally called “Multimedia Content Description Interface”, hereby incorporated by reference in its entirety.
MPEG-7 will be a content representation standard for multimedia information search and will include techniques for describing individual media content and their combination. Thus, MPEG-7 standard is aiming to providing a set of standardized tools to describe multimedia content. Therefore, the MPEG-7 standard, unlike the MPEG-1, MPEG-2 or MPEG-4 standards, is not a media content coding or compression standard but rather a standard for representation of descriptions of media content. The data representing descriptions is called “meta data”. Thus, irrespective of how the media content is represented, i.e., analogue, PCM, MPEG-1, MPEG-2, MPEG-4, Quicktime, Windows Media etc, the meta data associated with this content, may in future, be MPEG-7.
Often, the value of multimedia information depends on how easily it can be found, retrieved, accessed, filtered and managed. In spite of the fact that users have increasing access to this audiovisual information, searching, identifying and managing it efficiently is becoming more difficult because of the sheer volume of the information. Moreover, the question of identifying and managing multimedia content is not just restricted to database retrieval applications such as digital libraries, but extends to areas such as broadcast channel selection, multimedia editing and multimedia directory services.
Although techniques for tagging audiovisual information allow some limited access and processing based on text-based search engines, the amount of information that may be included in such tags is somewhat limited. For example, for movie videos, the tag may reflect name of the movie or list of actors etc., but this information may apply to the entire movie and may not be sub-divided to indicate the content of individual shots and objects in such shots. Moreover, the amount of information that may be included in such tags and architecture for searching and processing that information is severely limited.