1. Field of the Invention
This invention pertains to methods to screen out, from video surveillance data, instances of independent motion (i.e., motion other than camera motion) and, particularly, to a method to perform screening in a real-time or faster than real-time manner from compressed video streams.
2. Discussion of the Related Art
An object in motion from a surveillance video may be interpreted as a potential activity of concern. In domestic law enforcement and security, and in military/intelligence applications, a large motion in the video may be related to activities such as attempts to access secure facilities, military maneuvers, deployment of equipment and personnel, and similar activities.
If the camera is still, the problem of automatic detection of motion from the video is trivial. However, in many applications, it is not possible to have a still camera. An example is the U.S. military unmanned aerial vehicles (UAVs) in 1995, flying over an area in Bosnia for military surveillance purposes. The goal of the surveillance was to detect any military maneuvers in that area, which were typically manifested as target motion in the video. Since in this case the cameras were also in motion, the problem of detecting any target motion was consequently translated to the problem of detecting independent motion, IM: motion other than the camera motion.
Due to large scale, automatic data collection (multiple UAVs in 24 hours, nonstop data collection), the data volume was very large. The surveillance analysis at that time required a great deal of manpower to manually review all the data in order to extract those video shots that contained independent motion for further intelligence analysis. Manual detection of independent motion was found to be extremely tedious, very expensive, and very difficult to perform in a timely manner. Consequently, a detection method, working directly with video data in real-time or even in faster than real-time, would have been very desirable.
When a surveillance video is taken from a camera that is also in motion, every pixel in a frame may contain motion. For the background pixels, the motion reflected in the image domain corresponds to 3D camera motion. On the other hand, for the pixels corresponding to independently moving objects in the frame, their motion corresponds to the combination of the 3D camera motion and their own independent motion in the 3D space. In this case, simple frame based differencing does not work, and sophisticated techniques must be applied to separate the independent motion from the camera motion, called the background motion. This problem becomes even more complicated when 3D motion parallax occurs between two frames. In this case, a 3D motion model must be applied in order to robustly and accurately separate the independent motion from the camera motion. Therefore, the problem of detection of independently moving objects is reduced to the problem of independent motion detection.
Two possibilities related to independent motion detection include:
a) given a video sequence, quantitative independent motion detection, using temporal segmentation of those sub-sequences (called shots) that contain the scene, in which at least one independently moving object is present, and using spatial segmentation and delineation of each of the independently moving objects in each of the frames of these shots; and
b) qualitative independent motion detection, which, in contrast, refers to only the temporal segmentation of the video sequence to return those shots that contain independent motion and does not use spatial segmentation to identify the independently moving objects in each frame.
The objective of the qualitative independent motion detection of this invention is, using the example of the U.S. military surveillance in Bosnia, to eliminate the painstaking and tedious effort to manually search the very large amounts of video data to detect those shots containing independent motion, since the majority of the video does not have independent motion. Automatic detection can be performed of the relatively few shots with independent motion. Thus the objective is qualitative independent motion detection from the temporal sequence of the video, as opposed to quantitative independent motion detection in all the frames.
Motion analysis has been a topic in computer vision and image understanding research for many years. Independent motion analysis deals with multiple motion components simultaneously, and therefore is presumably more challenging.
Some of the several methods to accomplish independent motion detection include:
a solution assuming the camera was under translation;
a solution assuming the availability of optical flow, using the flow to group regions on the basis of the rigidity constraint over two frames;
methods based on velocity constraints to detect independently moving objects;
a method based on the rigidity constraint;
a statistical regularization solution using a Markov Random Field model;
using robust statistical regression techniques to detect independent motion;
using geometric constraints for independent motion segmentation;
a solution based on normal flow field, the spatio-temporal derivatives of the image intensity function, as opposed to the typical optical flow field;
a three-frames constraint based on a general 3D motion parallax model to detect independent motion;
using stereo camera streams to detect independent motion by using the combination of applying the normal flow field to the stereo streams and using robust statistical regression;
using a low-dimensional projection-based method to separate independent motion using the epipolar structure of rigid 3D motion flow fields;
a method based on model selection and segmentation for separating multiple 3D motion components;
a solution to the problem in a special case in which the scene may be approximated as a plane, which is valid for typical aerial surveillance, based on spatio-temporal intensity gradient measurements to compute an exact background motion model directly where independent motion is detected based on the constraint violation for the mosaics developed over many frames; and
a method that simultaneously exploits both constraints of epipolar and shape constancy over multiple frames, based on the previous work on plane-plus-parallax decomposition explicitly estimating the epipolar and the homography between a pair of frames.
Most of the existing techniques for independent motion detection are quantitative. Due to this fact, few, if any, of them can accomplish real-time detection, as quantitative segmentation in each frame to identify independently moving objects is always computationally expensive. While quantitative detection is useful in general, a qualitative method can be most appropriate. This is because in the military, intelligence, as well as law enforcement applications, time is an important or critical factor. The qualitative method of the invention saves time as the spatial segmentation in the image domain in each frame is avoided, which saves extra computation.
Moreover, it is not necessary to use a quantitative method in these applications. Even if the independently moving targets are all segmented and identified in each frame using the quantitative methods, computer vision and artificial intelligence are inadequate to attain fully automated capability to interpret whether the segmented and identified independent motion in the frames is of any military or intelligence significance without human experts' interaction. Therefore, these detected shots must be left to the image analysis personnel for further analysis.
An additional observation is that in prior work, most of the existing techniques for independent motion detection are based on image sequences, as opposed to compressed video streams. In other words, given a video, such as a surveillance video, these methods require that the video must be first decompressed fully to recover an image sequence before these methods are used. This restriction significantly hinders these techniques from practical applications, because information volume continues to grow, particularly in security and intelligence applications where the data volume is massive, and the video data must be stored in a compressed form, such as that of the ISO/IEC 13818-2 MPEG standard. The ISO 13818-2 MPEG standard describes a system known as MPEG-2 for encoding and decoding digital video data. Digital video data is encoded as a series of code words in a manner that causes the average length of the code words to be much smaller than would be the case.