Video cameras, such as Pan-Tilt-Zoom (PTZ) cameras, are omnipresent nowadays, mostly for surveillance purposes. The cameras capture more data (video content) than human viewers can process. Automatic analysis of video content is therefore needed.
An important step in the processing of video content is the segmentation of video data into foreground objects and a background scene, or background. Such segmentation allows for further analysis, such as detection of specific foreground objects, or tracking of moving objects. Such further analysis may, for example, result in sending an alert to a security guard.
Automatic analysis is also relevant to PTZ cameras. PTZ cameras may change their field of view without human intervention based on preset orientations, or even based on the observed video content. For example, when tracking a walking person, the camera may automatically pan to keep the person within the field of view.
A common approach to foreground/background segmentation is background subtraction. For example, the median pixel value for a position in a scene may be compared against the current pixel value at that position. If the current pixel value is similar to the median pixel value, the pixel is considered to be belonging to the background, otherwise the pixel is considered to be belonging to a foreground object. The challenge for such approaches is to define similarity and to be robust against stationary foreground objects that may be confused with background.
Although more complex background modelling methods are known in the art, these methods are computationally expensive, and their memory requirements typically do not allow the methods to be embedded on devices such as cameras.