FIGS. 1A and 1B illustrate two image frames from a low-resolution digital camera at Mammoth Hot Springs, in Yellowstone National Park, taken 67 seconds apart. Even with a careful inspection of image frames 1 and 2, it is difficult to detect any changes between the two images. Automated processing has been used with remote staring sensors (e.g., webcams, security cameras, surveillance cameras, radiometric cameras, traffic cameras, etc.) to detect subtle differences between frames. The automated techniques have traditionally relied on some form of background suppression to detect transient events (changes) observed by staring sensors: to detect events initiating at time t, one begins with the image frame corresponding to t and subtracts an estimate of the “background” signal existing prior to this time. This background estimate may be a single prior frame, a reference image containing no moving objects, or a composite computed from a recent history of frames. For example, the image frames 1 and 2 in FIGS. 1A and 1B, respectively, show the same scene. To detect change, one might subtract one from the other (e.g., FRAME 2-FRAME 1) and generate a difference frame. This difference frame is intended to eliminate the static background and highlight dynamic components in the frames. If the frame and background estimate are not spatially registered to one another prior to subtraction, much of the signal in the difference frame is due to scene gradients, rather than actual changes in the scene (see FIG. 1C). When image frame 2 and background frame 1 are properly registered (to a small fraction of a pixel) prior to subtraction, true changes in the scene stand out more clearly (see FIG. 1D). Unfortunately, sub-pixel registration is often not computationally feasible at sensor frame rates, and is not robust to pixel defects.
Detecting “true” motion in a scene is a key problem in many security and surveillance applications. For applications using staring sensors (e.g., fixed cameras), resulting in a static background scene, background suppression approaches like frame subtraction are commonly used to detect object motion or other scene changes. An intensity difference can be computed between two successive frames, between a frame and a reference image containing no moving objects, or between a frame and a composite background estimate computed from a recent history of sensor frames. The computed intensity difference is then thresholded to detect a motion event within the staring sensor's field of vision.
Significant problems arise whenever the statistical properties of the background signal change abruptly, typically due to sensor or platform jitter. Pixels located in regions of high scene gradient can change substantially with increased jitter, leading to myriad “clutter” false detections whenever the jitter level exceeds that observed in the training data for estimating the background. A false detection is the wrongful determination that detected change within a staring sensor's field of vision is not due to background jitter or noise. For many staring sensors, scene-induced “clutter” is the single largest source of false alarms, limiting both run-time performance and detection sensitivity.
Outdoor applications pose additional challenges, for the background scene itself may not be stationary due to natural events such as foliage moving in the wind, animated water, precipitation, or other phenomena. In these applications, conventional techniques have used filters to predict intensity values in the presence of dynamic backgrounds. Such predictors have the advantage of being able to learn repetitive patterns and thus detect moving objects. However, even these adaptive techniques often fail in the presence of camera jitter.