The present invention is directed, in general, to image retrieval systems and, more specifically, to a system for representing the trajectory of a moving object or region in animated visual data for subsequent content-based indexing, retrieval, editing, analysis and enhanced visualization.
The advent of digital television (DTV), the increasing popularity of the Internet, and the introduction of consumer multimedia electronics, such as compact disc (CD) and digital video disc (DVD) players, have made tremendous amounts of multimedia information available to consumers. As video and animated graphics content becomes readily available and products for accessing it reach the consumer market, searching, indexing and identifying large volumes of multimedia data becomes even more challenging and important.
The term xe2x80x9cvisual animated dataxe2x80x9d in this disclosure refers to natural video, as well as to synthetic 2D or 3D worlds (e.g., VRML), or to a mixture of both video and graphics (e.g., MPEG-4). Different criteria are used to search and index the content of visual animated data, such as a video clip. Video processing systems have been developed for searching frames of visual animated data to detect, identify and label objects of a particular shape or color, or to detect text in the frames, such as subtitles, advertisement text, or background image text, such as a street sign or a xe2x80x9cHOTELxe2x80x9d sign.
However, multimedia content-based indexing and retrieval systems rarely take into account the trajectory of objects in the frames of visual animated data. Many of these systems were developed only for still image retrieval. Some systems were later extended to animated data by first summarizing them as consecutive sequences of shots, then representing each shot using key-frames, and finally applying on the key-frames the techniques that were developed for the still images. In a few systems, consideration was given to camera motion in a shot, but still not to object trajectory.
VideoQ, developed by the ADVENT Project of the Image and Advanced TV Lab at Columbia university, is a multimedia content-based indexing and retrieval system that deals with object motion. VideoQ allows queries based on an object""s motion trail(s). The motion trail of an object is described by an ordered sequence of the object""s center of mass (i.e., centroid) trajectory vectors, for each time instant in the sequence.
In different application contexts dealing with visual animated data, other representations are used to deal with motion in video frames. In coding standards such as MPEG-1, MPEG-2, MPEG-4, H.261 and H.263, motion is represented as fields of two-dimensional vectors corresponding to the xe2x80x9cmotionxe2x80x9d of blocks of pixels between each image. Motion vectors can be skipped at any time instant on any block(s) of the image. However, this block is then considered as non-moving at that time instant. Since the pixel blocks are typically only 8xc3x978 to 16xc3x9716 in size, this representation leads to a large number of vectors in adjacent blocks and/or consecutive images that are very similar to each other.
Moreover, although this information is called xe2x80x9cmotionxe2x80x9d in the above standards, it was not designed to match the actual real xe2x80x9cmotionxe2x80x9d within the animated visual material. Instead, the information is used to find similarities in surrounding images that may reduce the coding cost of the current image. Therefore, such motion vectors are unsuitable for use in multimedia data indexing and retrieval.
Presently under development is a new MPEG standard, MPEG-7, which is intended to establish a standard set of xe2x80x9cdescriptive elementsxe2x80x9d that can be used to describe different aspects of multimedia data including the motion of objects. These descriptive elements, called Descriptors and Description Schemes, directly describe the content of the visual animated data, such as a video clip, thereby providing a fast and efficient way to search through an archive of video files and animated graphics files. Besides these Descriptors (D) and Description Schemes (DS), MPEG-7 will also standardize a language to express the descriptions (DDL). Descriptions are coded so that they can be transmitted and stored efficiently. The MPEG-7 standard, however, is nowhere near completion and many of its intended objectives may never be realized. There is no guarantee that the trajectory of objects will be adequately addressed.
There is therefore a need in the art for improved systems and methods for describing the trajectory of objects in a series of visual animated data frames. In particular, there is a need in the art for systems that are capable of determining the trajectory of an object in visual animated data frames and representing the detected trajectory of the objects in a Descriptor or Description Scheme that is suitable for use in a content-based indexing and retrieval system.
To address the above-discussed deficiencies of the prior art, it is a primary object of the present invention to provide a flexible and generic representation for the trajectory of objects in order to make searching and indexing easier. The disclosure does not address the coding of the description nor its expression within the description definition language (DDL). The proposed descriptive data structure, when using MPEG-7 terminology, can be considered either as a composited Descriptor or as a simple primary Description Scheme.
The present invention is not constrained to the needs of one or more particular applications or to any particular data source format. Advantageously, the present invention links descriptors to human perceptual criteria and to the actual semantic content that the data describe. Humans perceive motion at a high level. Accordingly, the present invention uses a high level description for the trajectory of an object by representing it in the scene as the trajectory of one point of the object, such as its center of mass (or centroid). In order to further describe the motion of a scene, the object-based descriptions can be complemented by a camera (or viewpoint) motion description. Finer details could also be added by complementing it with a description for the object deformation, if any.
In an advantageous embodiment of the present invention, there is provided, for use in a system capable of detecting a movement of a selected object in a sequence of visual animated data frames, a video processing device capable of generating a descriptor data structure representative of a trajectory of the selected object. The video processing device comprises an image processor capable of identifying the selected object in a first visual animated data frame and at least a second visual animated data frame and determining therefrom a trajectory of the selected object in a coordinate space having at least a first dimension and a second dimension. The image processor generates the descriptor data structure from the trajectory by generating at least two of: a) first trajectory data representing a position of the object in the coordinate space; b) second trajectory data from which a speed of the object in the coordinate space may be determined; and c) third trajectory data from which an acceleration of the object in the coordinate space may be determined.
The present invention therefore represents the trajectory of objects in generic, high-level terms that are readily understandable to a user. Thus, a user can search for an object in a sequence of visual animated data frames, such as a video tape, simply by giving an exemplary sequence or by giving a specific speed, acceleration, or location in the frames, or a combination thereof. The video processing device can then rapidly search the trajectory descriptor table for each object in the video tape in order to find object(s) that match the user-specified search criteria.
In one embodiment of the present invention, the coordinate space comprises a first dimension, a second dimension orthogonal to the first dimension, and a third dimension orthogonal to the first and second dimensions. Thus, the present invention may be used to create and to search the descriptor tables of objects moving in two dimensions, such as the descriptor tables in a video tape of cars moving past a surveillance camera. The present invention may also be used to create and to search the descriptor tables of objects moving in three dimensions, such as the descriptor tables of objects in a VRML environment.
In another embodiment of the present invention, the second trajectory data comprises a velocity value indicating a speed of the object. In still another embodiment of the present invention, the second trajectory data comprises a start position indicating a position of the object in the first visual animated data frame, an end position indicating a position of the object in the at least a second visual animated data frame, and an elapsed time value indicating a duration of time between the first visual animated data frame and the at least a second visual animated data frame, and wherein the speed of the object is determined from the start time, the end time, and the elapsed time value. Thus, the present invention may calculate the speed of the object and save the speed value directly in the descriptor table. Alternatively, the present invention may store the speed indirectly by saving the start position of the object in one frame and the end position in another frame, along with the elapsed time between the frames, and thereafter the speed may be calculated when needed.
In yet another embodiment of the present invention, the speed determined from the second trajectory data is-an absolute speed given in relation to the boundaries of the visual animated data frames. In a further embodiment of the present invention, the speed determined from the second trajectory data is a relative speed given in relation to a background scene of the visual animated data frames. This allows the present invention to account for the motion, if any, of the camera that recorded the sequence of animated visual data frames, or of any applicable viewpoint (e.g., a joystick simulated for 3D games). The object""s trajectory may be represented in terms of its speed inside the frame boundaries (e.g., a fixed camera) or in terms of its speed relative to background objects (e.g., camera moving with the object).
In a still further embodiment of the present invention, the video processing device modifies the sequence of visual animated data frames by associating the descriptor data structure with the sequence of visual animated data frames. Thus, the video processing device can generate trajectory descriptor tables for one or more objects in a video data file or other visual animated data file and associate the trajectory descriptor table(s) with the video file, such as by linking it to the file or by merging it into the file, and the like. The link used may include semantic references for linking the descriptions to an object, special references for linking the descriptions to a region, or temporal references for linking the descriptions to temporal positions in the file. This makes subsequent searching easier since the descriptor files are part of the file and do not have to be generated at search time.
A trajectory descriptor table representing the trajectory of an object in a sequence of visual animated data frames may be embodied in a novel signal produced by and searchable by a video processing device in accordance with the present invention. The novel signal comprises, 1) a visual animated data signal comprising a sequence of visual animated data frames portraying a selected object; and 2) an object trajectory descriptor signal representative of a trajectory of the selected object, the object trajectory descriptor signal comprising a descriptor data structure indicating a trajectory of the selected object in a coordinate space having at least a first dimension and a second dimension, wherein the descriptor data structure comprises at least two of a) first trajectory data representing a position of the selected object in the coordinate space; b) second trajectory data from which a speed of the selected object in the coordinate space may be determined; and 3) third trajectory data from which an acceleration of the selected object in the coordinate space may be determined.
The novel object trajectory descriptor signal may be embedded in, and transmitted with, the sequence of visual animated data frames that form the visual animated data signal. Alternatively, the novel object trajectory descriptor signal may be distinct from, and transmitted or stored separately from, the sequence of visual animated data frames.
In an advantageous embodiment of a signal in accordance with the present invention, the coordinate space comprises a first dimension, a second dimension orthogonal to the first dimension, and a third dimension orthogonal to the first and second dimensions.
In one embodiment of a signal in accordance with the present invention, the second trajectory data comprises a velocity value indicating a speed of the selected object.
In another embodiment of a signal in accordance with the present invention, the second trajectory data comprises a start position indicating a position of the selected object in a first visual animated data frame, an end position indicating a position of the selected object in a second visual animated data frame, and an elapsed time value indicating a duration of time between the first visual animated data frame and the second visual animated data frame, and wherein the speed of the selected object is determined from the start time, the end time, and the elapsed time value.
In still another embodiment of a signal in accordance with the present invention, the speed determined from the second trajectory data is an absolute speed given in relation to the boundaries of the visual animated data frames.
In yet another embodiment of a signal in accordance with the present invention, the speed determined from the second trajectory data is a relative speed given in relation to a background scene of the visual animated data frames.
The foregoing has outlined rather broadly the features and technical advantages of the present invention so that those is skilled in the art may better understand the detailed description of the invention that follows. Additional features and advantages of the invention will be described hereinafter that form the subject of the claims of the invention. Those skilled in the art should appreciate that they may readily use the conception and the specific embodiment disclosed as a basis for modifying or designing other structures for carrying out the same purposes of the present invention. Those skilled in the art should also realize that such equivalent constructions do not depart from the spirit and scope of the invention in its broadest form.
Before undertaking the DETAILED DESCRIPTION, it may be advantageous to set forth definitions of certain words and phrases used throughout this patent document: the terms xe2x80x9cincludexe2x80x9d and xe2x80x9ccomprise,xe2x80x9d as well as derivatives thereof, mean inclusion without limitation; the term xe2x80x9cor,xe2x80x9d is inclusive, meaning and/or; the phrases xe2x80x9cassociated withxe2x80x9d and xe2x80x9cassociated therewith,xe2x80x9d as well as derivatives thereof, may mean to include, be included within, interconnect with, contain, be contained within, connect to or with, couple to or with, be communicable with, cooperate with, interleave, juxtapose, be proximate to, be bound to or with, have, have a property of, or the like; and the term xe2x80x9cprocessorxe2x80x9d or xe2x80x9ccontrollerxe2x80x9d means any device, system or part thereof that controls at least one operation, such a device may be implemented in hardware, firmware or software, or some combination of at least two of the same. xe2x80x9cObjectxe2x80x9d herein means any semantic entity or group of pixels selected throughout a sequence. It should be noted that the functionality associated with any particular controller may be centralized or distributed, whether locally or remotely. Definitions for certain words and phrases are provided throughout this patent document, those of ordinary skill in the art should understand that in many, if not most instances, such definitions apply to prior, as well as future uses of such defined words and phrases.