A video is a sequence of images. The images may also be referred to as frames. The terms ‘frame’ and ‘image’ are used interchangeably throughout this specification to describe a single image in an image sequence, or a single frame of a video. An image is made up of pixels where each pixel is represented by one or more values representing the visual properties at that pixel. For example, in one scenario three (3) values are used to represent the visual properties of a pixel: Red, Green and Blue colour intensity of each pixel.
The terms foreground objects and foreground refer to transient objects that appear in a scene captured on video. Such transient objects may include, for example, moving humans. The remaining part of the scene is considered to be background, even where the remaining part includes minor movement, such as water ripples or grass moving in the wind.
Scene modelling, also known as background modelling, involves modelling the visual content of a scene, based on an image sequence depicting the scene. One use of scene modelling is foreground segmentation by background subtraction. Foreground segmentation is also known as foreground/background separation. Foreground segmentation may also be described by its inverse (i.e., background segmentation). Examples of foreground segmentation applications include activity detection, unusual object or behaviour detection, and scene analysis.
Foreground segmentation allows a video analysis system to distinguish between transient foreground objects and the non-transient background through scene modelling of the non-transient background, and a differencing operation between that background and incoming frames of video. Foreground segmentation can be performed by using scene modelling and identifying portions of the modelled scene which are either moving, or recently changed/added, or both.
In one scene modelling method, the content of an image is divided into one or more visual elements, and a model of the appearance of each visual element is determined. Examples of possible visual elements include: a pixel, or an 8×8 DCT block. A scene model may maintain a number of models for each visual element location, with each of the maintained models representing different modes of appearance at each location within the scene model. The models maintained by a scene model are known as mode models, and mode models that correspond to background visual elements are known as a background modes. For example, there might be one mode model for a visual element in a scene with a light being on, and a second mode model for the same visual element at the same location in the scene with the light off.
One particular challenge to scene modelling is the “camouflage” problem. Camouflage is caused by areas of foreground that are similar in appearance to background. These areas of foreground are typically misclassified as background by scene modelling methods. When parts of a foreground object are not detected (such as due to the camouflage problem), higher level analysis (such as object tracking and activity detection) can fail. For example, a foreground object may be detected as two separate parts due to misclassifications, and a tracking module will track two separate objects, while a counting module will count two objects instead of one.
There are scene modelling techniques which use post-processing steps to reduce the camouflage problem. A typical post-processing step may be a median filter, or a morphological operation. These steps are reliant on most of the foreground being detected in an area. Large areas of foreground that have been misclassified as background cannot be solved by median filters and morphological operations. Other solutions may perform hole filling of connected components. However, hole filling can change areas of true background to foreground (for example, the gap between a person's legs).
Other scene modelling techniques have used Markov Random Field techniques, such as the graph cut algorithm, to improve robustness to misclassification of visual elements. However, such techniques are computationally expensive, particularly for substantially real-time surveillance applications, and are still unreliable for large areas of misclassified foreground.
Thus, a need exists to provide an improved approach for scene modelling, that is both robust to camouflage scenarios with large areas of misclassified foreground and that is also relatively computationally inexpensive.