It is well known that detecting moving objects in streaming video is a significant and difficult research problem. One may categorize motion, e.g., in a video of a typical real life environment, as interesting or salient and uninteresting or distracting. Salient motion is normally defined as motion from a transient object such as a person or a vehicle in the scene. Distracting motion is background oscillatory or random motion, e.g. leaves swaying in the wind. Since typically, salient motion is of interest in a particular scene, distracting motion complicates salient motion detection.
Background subtraction is one conventional approach to effectively detect moving objects in a scene with a stationary background. However, where the scene is dynamic with a non-stationary background, detecting moving objects is more difficult. Adaptive background subtraction has been developed to handle non-stationary background. For example, Ren et al., “Motion Detection with Non-stationary Background,” International Proceedings of the 11th International Conference on Image Analysis and Processing, 2001, 78-83, teaches a Spatial Distribution of Gaussians (SDG) model to detect and, approximately, extract moving objects using motion compensation. Ren et al. demonstrates the capability of detecting small moving objects with a highly textured background with pan-tilt camera motion. In another example, Stauffer et al., “Adaptive Background mixture Models for Real-time Tracking”, CVPR99, June, 1999, teaches modeling each pixel as a mixture of Gaussians and using an on-line approximation to update the model. Stauffer et al. can deal with lighting changes, slow-moving objects and introducing or removing objects from the scene. In yet another example, Monnet et al., “Background Modeling and Subtraction of Dynamic Scenes”, International Proceedings of International Conference on Computer Vision (ICCV), 2003, Pages 1305-1312, teaches prediction-based online modeling of dynamic scenes. Monnet et al. has been somewhat effective on coastline scenes with ocean waves and on pastoral scenes with swaying trees. Unfortunately, the approaches all require extensive learning from hundreds of images of the scene background or, frames, without moving objects to learn the background model. Further, it is difficult to detect objects of interest in an area dominated by distracting motion, e.g., in an ocean scene, for example, especially if the distracting motion has the same general direction as the objects, e.g., as the ocean waves.
As shown above, background subtraction has not proven particularly effective and frequently provides false positives. False positives have been especially frequent for an environment that includes objects with distracting motion, e.g., specularities on water, vegetation in the wind and etc. For example, application of background subtraction to a person walking in front of oscillating branches on a windy day detects scene movement for both the person and the moving leaves. See, e.g., Horprasert et al., “A Statistical Approach for Real-Time Robust Background Subtraction and Shadow Detection,” Proceedings of IEEE Frame-Rate Workshop, Kerkyra, Greece, 1999.
Finding the temporal difference in a video scene has proven to be the simplest approach to extracting moving objects and, also, adapting to a dynamic environment. Unfortunately, temporal differencing does not detect the entire shape of a moving object with uniform intensity. Hybrid change detectors have combined temporal difference imaging and adaptive background estimation to detect regions of change. For example, Huwer et al. “Adaptive Change Detection for Real-time Surveillance applications,” International Proceedings of the 3rd IEEE Workshop on Visual Surveillance, 2000, pp. 37-45, teaches combining temporal differencing with adaptive background subtraction to handle lighting changes.
These prior art motion detection approaches still cannot handle quick image variations, e.g., a light turning on or off. Prior art adaptive background subtraction methods, in particular, require hundreds images to learn the background model, do not handle stationary objects in the scene that start to move; and cannot handle quick image variations and large distracting motion.
A limited example of salient motion detection is taught by Wildes, “A Measure of Motion Salience for Surveillance Applications,” International Proceedings of IEEE International Conference on Image Processing, p 183-187, 1998. Wildes teaches using spatiotemporal filtering to measure motion salience. To accommodate the velocity-dependent nature of spatiotemporal filters, Wildes's method has been effective on rapidly moving objects by treating the moving objects as moving with a certain velocity. However, Wildes does not work for slow moving objects. Wixson, “Detecting Salient Motion by Accumulating Directionally Flow,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22. No. 8. pp 774-779, August, 2000 teaches accumulating directionally-consistent flow to detect salient motion. Wixson calculates subpixel optical flow and integrates frame-to-frame optical flow over time for each pixel to compute a rough estimate of the total image distance the pixels have moved. Wixson updates a salient measure on each frame. The Wixson salient measure is directly related to the distance over which a point has traveled with a consistent direction. However, Wixson has proven very time consuming and objects leave salience “trails” in the results.
Thus, there is a need for detecting objects in a video scene moving through the scene with salient motion, even in the presence of large objects with distracting motion and while ignoring objects in the scene moving with distracting motion and especially, without requiring large numbers of images or frames to identify stationary or background objects and, regardless of quick object variations.