Many computer vision and video processing applications, in domains ranging from surveillance to human-computer interface to video compression, rely heavily on an early step, often referred to as “foreground segmentation” or “background removal” that attempts to separate novel or dynamic objects in the scene (“foreground”) from what is normally observed (“background”). Recently, Time-Adaptive, Per-Pixel Mixtures Of Gaussians (TAPPMOGs) have become a popular choice for real-time modeling of scene backgrounds. In these methods, the time series of observations at a given image pixel is treated as independent of that for all other pixels, and is modeled using a mixture of Gaussians. The per-pixel models are updated as new observations are obtained, with older observations losing influence over time. At each time step, a subset of the Gaussians in each per-pixel model is selected as representative of the scene background, and new observations that are not well-modeled by those Gaussians are designated as foreground.
Among real-time foreground segmentation methods, those based on TAPPMOGs have gained favor because they can adapt to occasional, persistent scene modifications (such as the moving of a chair or a change in global illumination) while simultaneously modeling parts of the background whose appearance changes frequently, but in a repeating manner (such as a tree swaying in the wind, or pavement moving in and out of shadow due to passing cars). However, TAPPMOG methods rarely, if ever, produce the ideal foreground segmentation desired by an application.
In the context of person-oriented applications relying on static cameras, where we hope that background removal leaves only the people in the scene, TAPPMOG modeling is challenged by a number of phenomena that commonly occur in relatively unconstrained environments such as home living rooms, retail stores, or the outdoors. For example, a person wearing blue jeans and walking on a blue carpet is effectively “camouflaged” to some extent, so that he is difficult to separate from the background model. Failures due to camouflage can be reduced by tightening the differencing method for separating foreground and background, but this makes the system more sensitive to erroneous foreground inclusions caused by shadows, inter-reflections, and subtle lighting variations. Another tradeoff exists in reducing the duration of temporary errors caused by rapid changes in global illumination, in camera gain or position, or in the location of background objects such as furniture. TAPPMOG systems eventually adapt to such changes, but will produce foreground errors in the meantime. Increasing the adaptation rate shortens the time these errors exist, but also causes the people to be incorporated into the background model more quickly when they remain in the scene for extended periods of time. Two people who enter the scene and stop to have a conversation will more quickly fade into the background, and at high-traffic regions of the scene, where the true background is frequently obscured by multiple foreground objects, the background model will degrade more quickly. Although TAPPMOGs provide some tolerance to dynamic background objects such as rotating fans, video displays, and foliage or flags waving in the breeze, they usually are not able model them perfectly, so that these objects sometimes are segmented as foreground.