Background-foreground segmentation is a well-known computer vision based technique for detecting objects in the field of view of a stationary camera. Initially, a system learns a scene during a training phase while no objects are present. A background model of the scene is built during the training phase using a sequence of images captured from the scene. Thereafter, during normal operation, new images are compared with the background model. Pixel positions with significant deviation from the background model are classified as foreground pixels, while the remaining pixels are labeled as background pixels. The output of the algorithm is generally a binary image depicting the silhouette of the foreground objects found in the scene.
Conventional background-foreground segmentation techniques perform well for segmenting and tracking people and other objects in open outdoor areas, such as a parking lot, or enclosed, spacious facilities, such as warehouses, office spaces, or subway platforms. These scenes, however, are quite different from those of a typical home. For example, a residential environment typically contains many objects in a small area. In addition, many objects in a residential environment are non-rigid, such as garments and curtains, or deformable, such as furniture and blinds (or both), and people tend to frequently vary their pose in a residential environment, such as between a standing, sitting and laying down position.
Most existing background-foreground segmentation techniques do not perform well in the presence of occlusions of lower body parts in cluttered environments, non-upright body poses, and spontaneous movement of large background objects such as doors, chairs, and tables. A need therefore exists for a method and apparatus for generating and maintaining improved background models for use in background-foreground segmentation.