In video surveillance applications, it is common to segment a sequence of images acquired of a scene by a camera into background and foreground portions so that objects in the scene can be detected and tracked. It is often assumed that the background portion is completely static, or slowly or periodically changing, while the foreground portion corresponds to groups of adjacent pixels that change much more rapidly than the pixels in the background portion.
A number of background subtraction methods are known that are robust to changes in lighting, pixel noise, camera position, and the like. A simple method marks the pixels in an image whose values are different from those of an image of the scene without any foreground objects. Such a method is often used for indoor scenes where the lighting and scene geometry can be tightly controlled.
To handle multimodal backgrounds, a background model can be used. Often, the model is in the form of Gaussian distributions. The back ground model is updated, for each successive image or frame by an iterative update mechanism, e.g., on-line expectation maximization (EM). However, on-line EM blends weak modes into stronger ones and distorts the model mean values.
To achieve accurate adaptation of background models, a Bayesian update procedure can be used, which, can also estimate the number of required models. That procedure can handle illumination variations and other arbitrary changes in the scene. There are also variants of a mixture of model background that uses image gradient and optical flow information. The mixture of model approaches can converge to any arbitrary distribution provided enough observations. However, the computational cost grows exponentially as the number of models in the mixture increases.
Another background modeling method uses non-parametric kernel density estimation. That method stores color values of pixels in the images in the sequence and estimates the contributions of a set of kernel functions using all of the data instead of iteratively updating background models at each frame. Both memory and computational cost are proportional to the number of images. As a result, kernel, based methods are impractical for real-time applications that acquire images continuously over long time.
Another 3D geometry-based method allows arbitrary changes in illumination but assumes that the background is geometrically static. That method uses a stereo camera. Pixels that violate a pre-computed disparity model of the empty scene are marked as the foreground.
Frequency-based techniques have good results when motion in the background is strongly periodic. For instance, coastal surveillance systems can take into account the periodicity of ocean waves and effectively remove that effect by modeling pixel-wise periodicity of the observations explicitly.
Another segmentation method adapts to the color composition of foreground objects while maintaining a model of the background. Even though that method aims to combine the benefits of pixel-, motion-, and region-based techniques, it has problems with periodic motion as well as non-convex objects. Prior knowledge can be integrated to the background detection. Due to the computation of full covariance matrix, feature space can be modified to include other information sources, such as motion information.
However, there is a class of problems, which conventional two-class segmentation methods cannot solve. Common sense dictates that an object left-behind in public places, such as a suitcase, backpack, or a package, can pose a significant security risk. Unfortunately, such an object does not qualify as either background or foreground. When the object enters the scene it is foreground. After being left-behind, the object is background. However, it is crucial that the object is not totally ignored. Furthermore, it is possible, still later, that the object is removed. Therefore, its presence in the scene should not be forgotten.
Methods are known that can detect left-behind objects, J. D. Courtney, “Automatic video indexing via object motion analysis,” PR 30(4), pp. 607 625, 1997, E. Auvinet, E. Grossmann, C. Rougier, M. Dahmane, and J. Meunier, “Left-luggage detection, using homographies and simple heuristics,” PETS, pp. 51-58, 2006, J. M. del Rincn, J. E. Herrero-Jaraba, J. R. Gmez, and C. Orrite-Uruuela, “Automatic left luggage detection and tracking using multi-camera ukf,” in PETS, pp. 59-66, 2006, P. T. N. Krahnstoever, T. Sebastian, A. Perera, and R. Collins, “Multi-view detection and tracking of travelers and luggage in mass transit environments,” in PETS, pp. 67-74, 2006, K. Smith, P. Quelhas, and D. Gatica-Perez, “Detecting abandoned luggage items in a public space,” in PETS, pp. 75-82, 2006, and S. Guler and M. K. Farrow, “Abandoned object detection in crowded places” in PETS, pp. 99-106, 2006.
The main drawback of most prior art methods is the fact that in order to identify portions of the video images corresponding to an object that has been left-behind, those methods require solving a much harder problem of object tracking or object detection as an intermediate step. Tracking objects in complex real-world scenes and in real-time is difficult.