Video cameras, such as Pan-Tilt-Zoom (PTZ) cameras, are omnipresent nowadays, mostly for surveillance purposes. The cameras capture more data (video content) than human eyes can process. Automatic analysis of video content is therefore needed.
An essential step in the processing of video content is the segmentation of video data into foreground and background. Such segmentation allows for further analysis, such as detection of specific foreground objects, or tracking of moving objects. Such further analysis may, for example, result in an alert to a security guard.
Automatic analysis is also relevant to PTZ cameras. PTZ cameras may change their field of view without human intervention based on preset orientations, or even based on the observed video content. For example, when tracking a walking person, the camera may pan to keep the person within the field of view.
A common approach to foreground/background segmentation is background subtraction. For example, the median pixel value for a position in a scene may be compared against the current pixel value at that position. If the current pixel value is similar to the median pixel value, the pixel is considered to be belonging to the background, otherwise the pixel is considered to be belonging to a foreground object. The challenge for such approaches is to define similarity. Techniques from the field of machine learning cannot be applied immediately to solve this challenge, because such techniques are dependent upon the availability of sufficient training data. The generation of training data is a significant expense in terms of human resources.
Although more complex background modelling methods are known in the art, these methods are computationally expensive, and their memory requirements do not allow the methods to be embedded on devices such as cameras.