With heightened awareness about security threats, interest in video surveillance technology and its applications has become widespread. Historically, such video surveillance has used traditional closed circuit television (CCTV). However, CCTV surveillance has recently declined in popularity because of the exponentially growing presence of video networks in the security market. Video networks, and in particular intelligent video surveillance technologies, bring to the security and other industries the ability to automate an intrusion detection system, maintain the identity of the unauthorized movement during its presence on the premises, and categorize moving objects. One aspect of this, video object segmentation (also known as Video Motion Detection), is one of the most challenging tasks in video processing, and is critical for video compression standards as well as recognition, event analysis, understanding, and video manipulation.
Any video motion detection algorithm should have certain functional and performance requirements. Such requirements may include that false positives are kept to a minimum, that the detection probability is close to 100%, that the detection algorithm is insensitive to environmental variations such as snow, rain, and wind, that the algorithm works in a broad spectrum of lighting conditions (well lit to poorly lit), that the algorithm provides robust results irrespective of camera positioning, that the algorithm handles variations and clutter in a scene due to camera vibrations, overlapping objects, slow and fast moving objects, and objects arriving into and departing from the scene, and that the algorithm handles shadows and reflections. Video motion detection (VMD) therefore poses a challenge due to the numerous variations that occur in typical outdoor and indoor scenarios. These requirements are met to one degree or another by motion detection schemes that are known in the art. These known motion detection schemes fall into one of the following categories—Temporal Frame Differencing, Optical Flow, and Background Subtraction.
The basic principle of temporal differencing based schemes is the calculation of an absolute difference at each pixel between two or three consecutive frames, and the application of a threshold to extract the moving object region. Though this method is rather simple to implement, it is not entirely effective in extracting the entire moving region—especially the inner part of moving objects.
The optical flow based method of motion segmentation uses characteristics of flow vectors of moving objects over time to detect moving regions in an image sequence. For example, one known method computes a displacement vector field to initialize a contour based tracking algorithm, called active rays, for the extraction of moving objects in a gait analysis. Though optical flow based methods work effectively even under camera movement, they require extensive computational resources. Also, such methods are sensitive to noise and cannot be applied to real-time video analysis.
For background subtraction techniques, pixels are modeled in the video frame in order to classify them as background (BGND) or foreground (FGND) pixels, thereby determining the motion or lack of motion for a pixel. Particular background modeling methods include the Hidden Markov Model (HMM), adaptive background subtraction, and Gaussian Mixture Models (GMM). In most applications, these methods have been limited by the availability of high speed computational resources. Consequently, the methods that have been used were designed to handle video captured under rather restricted or controlled situations. However, with the advent of increasing processor speeds coupled with the miniaturization of such processors, systems have been designed to address situations beyond restricted or controlled scenarios in modeling real-world processes under a plethora of varying conditions.
Increased processor power has therefore made background subtraction a viable means of VMD. In particular, the separation of background (BGND) and foreground (FGND) information using a background model, followed by an adaptive model update, has become a popular approach in most VMD methods to identify and segment moving objects. The accuracy and performance of many of these modeling schemes depends largely on the model initialization procedure. Of these models, the Gaussian Mixture Model, which models the individual pixel variations over a number of frames, has been successfully used in many applications. The Gaussian Mixture Model uses Expectation Maximization (EM) based model initialization. Notwithstanding the success of the Gaussian Mixture Model in many applications, it still has its problems, such as rather high resource requirements. Consequently, video processing applications that use background modeling methods such as the Gaussian Mixture Model would benefit from an improvement in these background methods.