Video cameras, such as Pan-Tilt-Zoom (PTZ) cameras, are omnipresent nowadays, mostly for surveillance purposes. The video cameras capture more data (hereinafter referred to as “video content” or “video data”) than human viewers can typically process. Automatic analysis of video content is therefore needed.
One step often used in the processing of video content is the segmentation of video data into foreground objects and a background scene, or background. Such segmentation allows for further analysis, such as detection of specific foreground objects, or tracking of moving objects. Such further analysis may, for example, result in sending an alert to a security guard, perhaps upon detection of a foreground object such as a suspicious box or tracking an object, such as an unauthorised intruder, entering or leaving a predefined area of interest.
Two aspects of such analysis are of particular interest.
First is the detection of abandoned objects. An example of an abandoned object is an item of luggage that has been brought into the scene being monitored, such as an airport lobby, during a time interval covering a sequence of video frames, and where the object is subsequently left in the scene.
Second is the detection of removed objects. An example of a removed object is a painting hanging on the wall in an art gallery, that was previously considered to be part of the background of the scene that is being monitored during a time interval covering a sequence of video frames, and where the object has been removed from the scene.
A common approach to foreground/background segmentation is referred to as “background subtraction”. In one example of this method a median pixel value for a pixel at a specified position in a scene, determined using pixel values for that pixel over a sequence of video frames, may be compared to a current pixel value for the pixel at that position in a current frame of the sequence of video frames. If the current pixel value is similar to the median pixel value, the pixel is considered to belong to the background. If however the current pixel value is not similar to the median pixel value, where the difference between the current pixel value and the median pixel value exceeds a specified threshold, then the pixel is considered to belong to a foreground object.
Using the aforementioned background subtraction approach, regions of change resulting from abandoned object events and removed object events are found to have similar properties. This is because both abandoned object events and removed object events result in a region of the scene that is different from a previous remembered background, however the region is otherwise not changing. It is sometimes advantageous to use a common technique to determine when either an abandoned object event or a removed object event has occurred, in order, for example, to signal an alert.
However, a problem with using such an approach to differentiate between abandoned object events and removed object events is that, to the detection system, the events are indistinguishable from each other. This can be a problem in circumstances where it is desirable for a surveillance system to be able to both draw the attention of an operator to the occurrence of such events, and also to be able to give different alerts based on the type of event that has occurred. In some cases abandoned object events may be of greater importance than removed object events, such as in the case of a suitcase being abandoned in a busy airport. In other cases removed object events may be of greater importance than abandoned object events, such as the case of a painting being removed from a wall in an art gallery.
It is possible to differentiate between the two events, to some degree, through an examination of boundary pixels of detected regions of change. It is found for example that a “strong” boundary on the region of change is typically more likely to indicate an abandoned object event, and a weak boundary is typically more likely to indicate a removed object event. Methods used to measure boundary strength include considerations of colour, texture and gradient. However, these pixel-based methods incur significant costs in terms of both high memory usage and costly computation time.
Moreover, a specified threshold is usually required in such methods. The boundary strength is compared to the threshold in order to make a final decision. In practice, the value of such a threshold is difficult to determine, and may be dependent upon the scene, or area of a scene. Learning a threshold for a scene, or for an area of a scene, requires heavy computation and thus is not feasible for camera applications.