Computer vision systems are used to automatically identify and interpret occurrences in a variety of environments. These occurrences may be people, objects or events that are identified by the system as noteworthy and likely candidates for further interpretation and understanding. One common use of computer vision systems is in video surveillance systems, which are generally used to automatically monitor and identify occurrences in, for example, offices, rooms and parking lots. These video surveillance systems usually contain a camera, directed at an area of interest, and a computer vision system that receives and processes a sequence of images from the camera and notifies human operators or other systems of important occurrences.
One important element of these computer vision systems is a background maintenance module that processes the image sequence and maintains a suitable background model throughout the sequence. In general, the image sequence contains several frames and each frame (a single image that is a collection of individual pixels) is divided into a background, which contains mostly irrelevant details of the frame, and a foreground, which contains significant details and occurrences within the frame. Further, the current frame being processed is known as the input frame. The background model is some representation of the background and its associated statistics based on properties of the individual pixels. These pixel properties may include, for example, pixel intensity, pixel color and associated statistical properties (such as mean and variance). Background maintenance is maintaining a suitable background model that provides a reasonably accurate representation of the background so that the background and the foreground can be distinguished in each frame of the image sequence.
Maintenance of the background model is important because the model indicates what the expected background should be so the actual background is not marked for further high-level processing (such as interpretation and understanding). Because high-level processing is costly and requires valuable system resources, unnecessarily processing background regions of the image can severely impair the performance of a computer vision system.
Background maintenance includes a background model that has properly defined stationarity and appropriate adaptation. Stationarity is a statistical pixel property of the background pixels that a particular background model assumes to be consistent from frame to frame. This statistical pixel property may include, for example, pixel intensity and pixel color. An object in a frame is classified as foreground (and may be further processed) if a statistical pixel property significantly varies from this consistent (or expected) value. Stationarity, however, does not mean the absence of motion, and for optimum performance a background maintenance system should be capable of handling movement in the background. For example, assume that a particular background model defines stationarity as a pixel intensity and a background in an image sequence contains a fluttering leaf on a tree. As each frame in the image sequence is processed the leaf will move on and off a certain pixel in each frame thereby radically changing the intensity of that pixel from frame to frame. In order to provide proper background maintenance, the stationarity of the background model should be defined to accommodate a range of intensity values that are wide enough to prevent the leaf from constantly being classified as foreground and yet narrow enough to properly capture foreground objects that may appear.
Adaptation is the ability of a background maintenance system to adapt to both sudden and gradual changes in the background. When these changes occur, the current background model being used by the system may become unsuitable because the background model may either be lacking the entire background or including some of the foreground. An adaptive background maintenance system is able to produce a new background model that includes the changed background. Further, an adaptive background maintenance system is able to incorporate into the new background model those objects that are initially classified as foreground but that regain stationarity. For example, suppose that an image sequence contains a chair that is part of a background. If the chair is nudged or otherwise momentarily set into motion the background maintenance system may initially classify the chair as foreground, even though in reality it is part of the background. Once the chair comes to rest the pixels representing the chair regain statistical stationarity. An adaptive background maintenance system would reclassify the still chair as background instead of permanently classifying the chair as foreground.
Most background maintenance systems process individual pixels independent of other pixels. The capability of this pixel processing, however, is limited and certain types of foreground objects can be missed. For example, when a homogeneously colored foreground object moves, pixel processing may not include the entire foreground object as foreground because the pixel processing cannot detect change in the interior pixels of the object. This is because pixel processing looks at isolated pixels and does not evaluate the neighboring sets of pixels. These systems cannot properly account for large changes because such a change occurs on a regional scale and not merely on an individual pixel scale. In order to achieve accurate, efficient and adaptive background maintenance processing of the image sequence should occur on a regional scale using relationships between pixels.
In some cases background changes may be significant and widespread. For example, sudden changes in illumination (such as when lights are turned on in a dark room) may drastically change the objects seen in a frame and can require remodeling of the background. Most background maintenance systems, however, use pixel processing, whereby each individual pixel in a frame is considered as an independent entity (i.e. independent of other pixels). These types of systems will not recognize the need for a new background model and will assign all or most of the frame as foreground. These systems cannot properly account for such a global change because the change occurs on a frame-wide scale and not merely on an individual pixel scale. In order to achieve accurate, efficient and adaptive background maintenance these frame-wide changes should be accounted for by processing not just individual pixels but also the entire frame.