Video cameras, such as Pan-Tilt-Zoom (PTZ) cameras, are omnipresent nowadays, and are often used for surveillance purposes. The cameras capture more data (video content) than human viewers can process. Automatic analysis of video content is therefore needed.
The terms foreground objects and foreground refer to transient objects that appear in a scene captured on video. Such transient objects may include, for example, moving humans. The remaining part of the scene is considered to be a background region, even if the remaining part includes movement, such as water ripples or grass moving in the wind.
An important step in the processing of video content is the separation of video data into foreground objects and a background scene, or background. This process is called foreground/background separation. Such separation allows for further analysis, such as detection of specific foreground objects, or tracking of moving objects. Such further analysis has many applications, including, for example, automated video surveillance and statistics gathering, such as people counting.
One method of foreground/background separation is statistical scene modelling. In one example, a number of Gaussian distributions are maintained for each pixel to model the recent history of the pixel. When a new input frame is received, each pixel from the input frame is evaluated against the Gaussian distributions maintained by the model at the corresponding pixel location. If the input pixel matches one of the Gaussian distributions, then the parameters of the associated Gaussian distribution are updated with an adaptive learning rate. Otherwise, a new Gaussian model for the pixel is created.
Another method of foreground/background separation maintains two pixel-based background models, B1 and B2. B1 contains the minimum value for each pixel location during the initialisation period and B2 contains the maximum value. When a new frame is received, the difference between the input frame and each of the background models is computed on a per-pixel basis. For each pixel, the corresponding model with the smallest difference for that pixel is updated using an approximated median update method with a fixed learning rate.
Another technique uses a double background model that is able to handle both rapid and gradual changes of the scene. In order to do that, a normal background model is derived from a list of cached frames that were sampled at a constant rate. The double background model system also tries to detect a large change condition in the scene. Only once a large change condition is detected is a new background model created, based on another list of cached frames that were sampled at a faster rate than the normal background model.
The development of a robust scene model is crucial for producing accurate foreground/background separation. One of the main challenges in building a robust scene model is adapting to changes in the scene. Some existing techniques handle gradual and slow changes very well. However, when changes in a scene become large and fast, the models of those existing techniques cannot catch up with the changes and consequently result in a large amount of false foreground detection. Another type of change that includes a prolonged large and fast change from a steady state followed by a quick and sudden reversion to the steady state cannot be handled by existing techniques. Existing methods that handle the first type of change, which includes a large and fast change, do not handle a sudden reversion. Conversely, existing methods that handle the second type of change, which includes a sudden reversion, do not handle a large and fast change.
Thus, a need exists for an improved method of video object detection.