In video surveillance, it is important to be able to detect moving objects in a scene as captured in a video sequence. There are many tools for motion detection in videos. Some of them track objects frame by frame by following features in the video stream. Others compare a current frame with a static background frame, pixel by pixel. The latter is the basis of background subtraction which aims at extracting moving objects by detecting zones where significant change occurs. Moving objects are referred to as foreground while static objects are part of the background.
The separation of moving objects from the background is a complex problem, which becomes even more difficult if the background is dynamic, such as if there are swaying trees or water ripples in the background, or if the illumination varies. In particular, a dynamic background may result in that the number of false detections of moving objects increases.
A review of background subtraction methods is given in the text book “Background Modeling and Foreground Detection for Video Surveillance” (Editors: Thierry Bouwmans, Fatih Porikli, Benjamin Höferlin, and Antoine Vacavant), CRC Press, Taylor & Francis Group, Boca Raton, 2015. See for example chapters 1 and 7.
Background subtraction methods generally involve a comparison of a current frame of a video stream with a reference background frame or model, free of moving objects. By comparing an image to the background frame or model, a decision may be taken whether or not each pixel in the image belongs to the foreground or the background. In this way, the image may be divided into two complementary sets of pixels—the foreground and the background.
Background subtraction requires definition of an underlying background model and an update strategy to accommodate for background changes over time. Plenty of background models have been proposed in the literature. This includes parametric models and non-parametric models.
An example of a parametric model is to model the background at a pixel location in the image by a Gaussian distribution. This may work well for a static scene, but will fail if the background pixels are multi-modal distributed, e.g., if there are waving trees in the background.
In order to deal with multi-modal distributed backgrounds it has been proposed to model the background at a pixel location in the image by a mixture of Gaussian distributions. Although such models are effective in modelling multi-modal distributed backgrounds, they have other drawbacks. For example, the estimation of the parameters may be difficult in a real-world noisy environment and it has been questioned whether natural images exhibit a Gaussian behaviour.
Due to these drawbacks, non-parametric models have been considered in the literature. For example, non-parametric kernel density estimates of the probability density function of past pixel values have been proposed. A strength of these models is that they may quickly adapt to high-frequency events in the background. A drawback is that they may have difficulty in handling events in the background evolving at different speeds.
Another type of non-parametric models is referred to as sample-based models. In such models, the background in each pixel is modelled by means of a collection of past background samples. In order to accommodate for background changes over time, the collection of past background samples is updated as a pixel in a current frame has been classified as belonging to the background. Wang and Suter (“A consensus-based method for tracking: Modelling background scenario and foreground appearance”. Pattern Recognition, 40(3), 2007) propose to update the collection of past background samples according to a first-in-first-out principle. This means that the collection of past background samples is updated such that the oldest background sample is removed from the collection and the pixel value of the current frame is added to the collection. In U.S. Pat. No. 8,009,918 B2 an alternative updating approach is described in which the pixel value of the current frame replaces a randomly selected background sample in the collection of background samples.
A drawback of these background updating methods is that they require that many background samples are stored per pixel in order to be robust to dynamic and multi-modal backgrounds, i.e., many background samples are needed to have a long memory of modalities. This leads to undesirably high processing and memory requirements. To handle sporadic background movements, such as sudden wind beams, an intractable amount of background samples would be required. There is thus room for improvements.