The proliferation of cheap sensors and increased processing power have made the acquisition and processing of video information more readily and economically feasible. Real-time video analysis tasks such as object detection and tracking can increasingly be performed efficiently on standard PC's for a variety of applications such as: industrial automation, transportation, automotive, security and surveillance, and communications. The use of stationary cameras is fairly common in a number of applications.
Background modeling and subtraction is a core component in motion analysis. A central idea behind such a module is to create a probabilistic representation of the static scene that is compared with the current input to perform subtraction. Such an approach is efficient when the scene to be modeled refers to a static structure with limited perturbation.
Background subtraction is a core component in many surveillance applications where the objective is to separate the foreground from the static parts of the scene. The information provided by such a module can be considered as a valuable low-level visual cue to perform high-level tasks of object analysis, like object detection, tracking, classification and event analysis. See, for example, Remagnino, P., G. Jones, N. Paragios, and C. Regazzoni: Video-Based Surveillance Systems: Computer Vision and Distributed Processing, Kluwer Academic Publishers, 2001; Mittal, A.
203, 2003; Grimson, W., C. Stauffer, R. Romano, and L. Lee: 1998, ‘Using adaptive tracking to classify and monitor activities in a site’, IEEE International Conference on Computer Vision and Pattern Recognition. Santa Barbara, Calif., 1998; Ivanov, Y. and A. Bobick, ‘Recognition of Multi-Agent Interaction in Video Surveillance’, IEEE International Conference on Computer Vision. Kerkyra, Greece, pp. 169-176, 1999; Cohen, I. and G. Medioni, ‘Detecting and Tracking Moving Objects in Video Surveillance’, IEEE International Conference on Computer Vision and Pattern Recognition. Ft. Collins, Colo., pp. II: 319-325; Boult, T., R. Mecheals, X. Gao, and M. Eckmann, ‘Into the woods: visual surveillance of non-cooperative and camouflaged targets in complex outdoor settings’, Proceedings of the IEEE pp. 1382-1402, 2001; Stauffer, C. and W. Grimson, ‘Learning Patterns of Activity Using Real-Time Tracking’, IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8), 747-757, 2000; and Collins, R., A. Lipton, H. Fujiyoshi, and T. Kanade: 2001, ‘Algorithms for Cooperative Multi-Sensor Surveillance’, Proceedings of the IEEE 89(10), 1456-1477, 2001).
A basis for the development of the subspace method for scene prediction is found in the work of Soatto et. al. (Soatto et al., Soatto, S., G. Doretto, and Y. Wu: 2001, ‘Dynamic Textures’, IEEE International Conference on Computer Vision. Vancouver, Canada, pp. II: 439-446, 2001, and in the publication by Doretto, G., A. Chiuso, Y. Wu, and S. Soatto: 2003, ‘Dynamic Textures’. International Journal of Computer Vision 51(2), 91-109, 2003, including an implementation of their algorithm and an implementation of Incremental PCA due to Silviu Minut.
To this end, one has to obtain a representation of the background, update this representation over time and compare it with the actual input to determine areas of discrepancy.
Such methods have to be adaptive and able to deal with gradual changes of the illumination and scene conditions. Methods for background modeling may be classified into two types: predictive and statistical.
Existing methods in the literature can effectively describe scenes that have a smooth behavior and limited variation. Consequently, they are able to cope with gradually evolving scenes. However, their performance deteriorates (FIG. 2) when the scene to be described is dynamic and exhibits non-stationary properties in time. Examples of such scenes are shown in FIGS. 1 and 10 and include ocean waves, waving trees, rain, moving clouds, etc.