The present invention is directed, in general, to video processing systems and more specifically, to a system for identifying and describing the content of visual animated data.
The advent of digital television (DTV), the increasing popularity of the Internet, and the introduction of consumer multimedia electronics, such as compact disc (CD) and digital video disc (DVD) players, have made tremendous amounts of multimedia information available to consumers. As video and animated graphics content becomes readily available and products for accessing it reach the consumer market, searching, indexing and identifying large volumes of multimedia data becomes even more challenging and important.
The term xe2x80x9cvisual animated dataxe2x80x9d herein refers to natural video, as well as to synthetic 2D or 3D worlds (e.g., VRML), or to a mixture of both video and graphics (e.g., MPEG-4). Different criteria are used to search and index the content of visual animated data, such as a video clip. Video processing systems have been developed for searching frames of visual animated data to detect, identify and label objects of a particular shape or color, or to detect text in the frames, such as subtitles, advertisement text, or background image text, such as a street sign or a xe2x80x9cHOTELxe2x80x9d sign.
Presently under development is a new MPEG standard, MPEG-7, which is intended to establish a standard set of xe2x80x9cdescriptorsxe2x80x9d that can be used to describe different aspects of visual animated data. The descriptors, or combinations of descriptors and description schemes, directly describe the content of visual animated data, such as a video clip, thereby providing a fast and efficient way to search through an archive of video files and animated graphics files. MPEG-7 is intended to standardize some descriptors and description schemes in a comprehensive description definition language (DDL) to describe the content of visual animated data.
A descriptor, at its most basic, is a representation of an attribute of a feature (or object) in visual animated data. A feature can be something very basic, such as the color of a pixel in a specific frame in a movie, or a feature can be something more conceptual and broad, such as the name of the movie or the age of the character portrayed within the story of the movie. Collections of related descriptors are called description schemes. This language for creating these descriptors and description schemes is called a xe2x80x9cdescription definition languagexe2x80x9d or DDL.
One goal of MPEG-7 is to allow content creators and content editors to describe any feature of visual animated data content in a manner that can be used by others and can be used for searching and retrieving the visual animated data content by the final consumers. Descriptors are coded so that they can be transmitted and stored efficiently. The MPEG-7 standard, however, is far from completion and many of its intended objectives may never be realized. Additionally, many of the MPEG-7 standard proposals include a full language for creating descriptors. The proposed languages allow a descriptor creator to specify the descriptor in a freeform manner using the syntax and semantics of the specific language. This is a xe2x80x9cscriptbasedxe2x80x9d approach in which each descriptor is a script that can be used whenever a specific feature needs to be described. Under this approach, one descriptor may look nothing like any other descriptor in the DDL. Thus, the descriptors and description schemes that are created may be highly individualized with little commonality according to the choices of the descriptor creator.
There is therefore a need in the art for improved systems and methods for searching and indexing the content of visual animated data including video clips. More particularly, there is a need for a description definition language (DDL) that implements highly structured descriptors and description schemes that are readily recognizable and searchable by parser programs and other applications that detect and analyze descriptor information associated with visual animated data.
To address the above-discussed deficiencies of the prior art, it is a primary object of the present invention to provide a template containing a standard set of attributes that can be used to describe any feature. Each template comprises a descriptor. A user may describe a feature using a standard template and fill in values to create the descriptor. Using the description definition language to create descriptors, a content creator can describe the lower-level individual features of the multimedia content being created. The content creator can also describe the relationships between these lower level features and collect descriptors into logical groupings using description schemes.
All descriptors and description schemes created in accordance with the principles of the present invention are based on the standard template with some variations. Using a predefined template or set of templates, rather than script-based descriptors, makes the descriptors and description schemes of a visual animated data file easily recognizable and searchable.
Accordingly in one embodiment of the present invention, there is provided a video processing device capable of generating a descriptor data structure representative of a selected feature in a visual animated data file. The video processing device comprises: 1) user input means capable of selecting the selected feature and generating a plurality of attribute values associated with the selected feature; and 2) an image processor capable of identifying the selected feature in the visual animated data file and receiving the plurality of attribute values from the user input means and, in response to receipt of the plurality of attribute values, generating the descriptor data structure by inserting selected ones of the plurality of attribute values into corresponding ones of a plurality of pre-defined attribute fields in a standard descriptor template.
According to one embodiment of the present invention, the image processor is further capable of associating the descriptor data structure with the visual animated data file to thereby produce a modified visual animated data file, wherein the selected feature may be identified in the modified visual animated data file by examining the descriptor data structure.
According to another embodiment of the present invention, the selected feature is an object appearing in the visual animated data file and the descriptor data structure contains attribute values representative of the object.
According to still another embodiment of the present invention, the selected feature is an image frame in the visual animated data file and the descriptor data structure contains attribute values representative of the image frame.
According to yet another embodiment of the present invention, the selected feature is a sequence of image frames in the visual animated data file and the descriptor data structure contains attribute values representative of the sequence of image frames.
According to a further embodiment of the present invention, the descriptor template further comprises a plurality of user-defined attribute fields and wherein the image processor is capable of receiving a plurality of user-defined attribute values from the user input means and inserting selected ones of the plurality of user-defined attribute values in corresponding ones of the user-defined attribute fields.
According to a still further embodiment of the present invention, the plurality of pre-defined attribute fields in a standard descriptor template comprises a unique identification (ID) attribute field, wherein the plurality of pre-defined attribute fields are the same for descriptor data structures having the same ID attribute field.
The foregoing has outlined rather broadly the features and technical advantages of the present invention so that those skilled in the art may better understand the detailed description of the invention that follows. Additional features and advantages of the invention will be described hereinafter that form the subject of the claims of the invention. Those skilled in the art should appreciate that they may readily use the conception and the specific embodiment disclosed as a basis for modifying or designing other structures for carrying out the same purposes of the present invention. Those skilled in the art should also realize that such equivalent constructions do not depart from the spirit and scope of the invention in its broadest form.
Before undertaking the DETAILED DESCRIPTION, it may be advantageous to set forth definitions of certain words and phrases used throughout this patent document: the terms xe2x80x9cincludexe2x80x9d and xe2x80x9ccomprise,xe2x80x9d as well as derivatives thereof, mean inclusion without limitation; the term xe2x80x9cor,xe2x80x9d is inclusive, meaning and/or; the phrases xe2x80x9cassociated withxe2x80x9d and xe2x80x9cassociated therewith,xe2x80x9d as well as derivatives thereof, may mean to include, be included within, interconnect with, contain, be contained within, connect to or with, couple to or with, be communicable with, cooperate with, interleave, juxtapose, be proximate to, be bound to or with, have, have a property of, or the like; and the term xe2x80x9cprocessorxe2x80x9d or xe2x80x9ccontrollerxe2x80x9d means any device, system or part thereof that controls at least one operation, such a device may be implemented in hardware, firmware or software, or some combination of at least two of the same. It should be noted that the functionality associated with any particular controller may be centralized or distributed, whether locally or remotely. Definitions for certain words and phrases are provided throughout this patent document, those of ordinary skill in the art should understand that in many, if not most instances, such definitions apply to prior, as well as future uses of such defined words and phrases.