1. Technical Field
The invention is related to characterizing video frame sequences, and more particularly to a system and process for characterizing a video shot with one or more gray scale images each having pixels that reflect the intensity of motion associated with a corresponding region in a sequence of frames of the video shot.
2. Background Art
In recent years, many different methods have been proposed for characterizing video to facilitate such applications as content-based video classification and retrieval in large video databases. For example, key-frame based characterization methods have been widely used. In these techniques, a representative frame is chosen to represent an entire shot. This approach has limitations though as a single frame cannot generally convey the temporal aspects of the video. Another popular video characterization method involves the use of pixel color, such as in so-called Group of Frame (GoF) or Group of Pictures (GoP) histogram techniques. However, while some temporal information is captured by these methods, the spatial aspects of the video are lost.
One of the best approaches to video characterization involves harnessing the characteristics of motion. However, it is difficult to use motion information effectively, since this data is hidden behind temporal variances of other visual features, such as color, shape and texture. In addition, the complexity of describing motion in video is compounded by the fact that it is a mixture of camera and object motions. Thus, in order to use motion as the basis for video characterization, it is necessary to extract motion information from the original frame sequence, and put it into an explicit format that can be operated on readily.
There are several approaches currently used for the representation of motion in video. The primary approach is motion estimation in which either a dense flow field is computed at the pixel level, or motion model parameters are derived. The latter can be used as a motion representation for further motion analysis. However, that approach is often limited to describing the consistent motion or global motion only. The former is a transform format of a real video frame, which can be used directly for motion-based video retrieval. However, many of the attributes of the optical flow field are not fully utilized owing to the lack of an effective and compact representation.
Another approach involves object based techniques, such as object segmentation and tracking, or motion layer extraction. These techniques allowed moving objects and their motion trajectories to be extracted and used to describe the motion in a video sequence. However, the semantic objects cannot always be identified easily in these techniques, making their practical application problematic.
Yet another approach to characterizing video sequences using motion takes advantage of temporal slices of an image volume to extract motion information. Although the temporal slices encode rich motion clues suitable for many applications, there are often many feigned visual patterns that confuse the motion analysis. In addition, the placement and orientation of the slices to capture the salient motion patterns is an intractable problem. Moreover, the computational complexity of the slice-based approach is high, and its results are often not reliable.
The development of MPEG video compression has brought with it various video characterizing methods using the motion vector field (MVF) that is computed as part of the video encoding process. In particular, these methods are used in conjunction with video indexing. For example, a so-called dominant motion method has been adopted in many video retrieval systems. However, this method does not provide sufficient motion information, since it computes only a coarse description of motion intensity and direction between frames owing to the fact that MVF is not a compact representation of motion. Moreover, it is impossible to discriminate the object motion from camera motion in the dominant motion method. In a related method, the parametric global motion estimation was used to extract object motion from background by neutralizing global motion. However, this extraction process is not always accurate and is processor intensive.
Some existing methods characterize video based on camera motion. In these methods, qualitative descriptions about camera motion models, such as panning, tracking, zooming, are used as motion features for video retrieval. However, although the camera motion is useful for filmmakers or other professional users, it is typically meaningless to the general users.
In addition to the need for video characterization in video retrieval type applications, such characterization is also useful in motion detection applications, such as surveillance and traffic monitoring. The simplest method of motion detection is based on characterizing the differences between frames. For example, the differences in pixels, edges and frame regions have been employed for this purpose. However, computing of differences between frames of a video is susceptible to noise. Dense flow field characterization methods have also been employed in motion detection applications. These methods are generally more reliable than difference-based methods. However, they cannot be used in real-time due to their computational complexity. Some learning-based approaches have also been proposed for motion detection. These methods involve a learned intensity probability distribution at each pixel which is less susceptible to noise. However, it is difficult to describe motion with just one probability distribution model, since motion in video is very complex and diverse. The previously mentioned temporal slice characterizations methods have also been employed in motion detection applications. For example, one such method constructs “XT” or “YT” spatio-temporal slices for detecting motions. Although these approaches are able to detect some specific motion patterns, it is difficult to select suitable slice positions and orientations because a slice only presents a part of motion information.