In video surveillance, it is important to be able to detect moving objects in a scene as captured in a video sequence. There are many tools for motion detection in videos. Some of them track objects frame by frame by following features in the video stream. Others compare a current frame with a static background frame, pixel by pixel. The latter is the basis of background subtraction which aims at extracting moving objects by detecting zones where significant change occurs. Moving objects are referred to as foreground while static objects are part of the background.
The separation of moving objects from the background is a complex problem, which becomes even more difficult if the background is dynamic, such as if there are swaying trees or water ripples in the background, or if the illumination varies. In particular, a dynamic background may result in that the number of false detections of moving objects increases.
A review of background subtraction methods is given in the text book “Background Modeling and Foreground Detection for Video Surveillance” (Editors: Thierry Bouwmans, Fatih Porikli, Benjamin Hoferlin, and Antoine Vacavant), CRC Press, Taylor & Francis Group, Boca Raton, 2015. See for example chapters 1 and 7.
Background subtraction methods generally involve a comparison of a current frame of a video stream with a reference background frame or model, free of moving objects. By comparing an image to the background frame or model, a decision may be taken whether or not each pixel in the image belongs to the foreground or the background. In this way, the image may be divided into two complementary sets of pixels—the foreground and the background.
Background subtraction requires definition of an underlying background model and an update strategy to accommodate for background changes over time. Plenty of background models have been proposed in the literature. This includes parametric models (e.g., Gaussian distribution) and non-parametric models (e.g., sample-based models).
However, in order to achieve a correct separation between background and foreground, no matter of what approach to background modelling that is employed, areas of a scene representing multi-modal environment (which means that there is a high probability of pixel values representing these areas will change values between frames of the video sequence capturing the scene) need to be handled differently when it comes to determining whether the area represents background or foreground, compared to more static areas, due to the larger differences in image content (represented by pixel values) that will inherently exist between frames in these areas.
There is thus a need for improvements within this context.