Existing methods for persistent surveillance tasks, such as automatic target detection and tracking, often rely on rectangular windows that contain pixels from both the target and background. As a result, target appearance models used for such tasks must essentially be trained against all possible backgrounds. Techniques for modeling the background often require the camera to be static, or require a training sequence with no foreground objects, or cannot handle complex foreground motions. Methods for automatic tracking that use sparse features, such as SIFT or SURF keypoints, often fail in low to medium resolution settings. Alternatively, dense optical flow-based methods for tracking are highly localized and have strict assumptions about the number of motion layers in the scene as well smoothness within motion layers.
While there has been much work on developing automated sensor algorithms for location, classification, identification, and tracking of targets and potential threats, the performance of state-of-the-art (SOA) methods is far from ideal, even under favorable conditions. In particular, false alarm and missed detection rates remain high for low-resolution images, small-scale targets, nonstationary cameras, and in the presence of occlusions, moving clutter, and adverse weather conditions (see, for example, the List of Incorporated Literature References, Reference No. 1).
Thus, a continuing need exists for a scene analysis system that provides pixel-accurate boundaries of complex foreground and moving background, and that greatly improves recognition and detection tasks by eliminating the extraneous background features.