1. Field of the Invention
The present invention relates to a method and apparatus for foreground segmentation of video sequences using features derived in the transform domain to adaptively model the background and to perform segmentation based on the difference of the features of the current frame from that of the background model. The present invention can be used, for example, in video surveillance and in video encoding.
2. Description of Related Art
The task of foreground segmentation of video sequences is to label the regions in an image as moving objects or as background. It is the fundamental step in many vision systems including video surveillance and human-machine interface. And it is useful in low-bandwidth telecommunications. Accurate segmentation is difficult due to such factors as illumination variation, occlusion, background movements, and noise. The challenges facing the segmentation task include: illumination variation due to gradual or sudden change, and global or local change; background change due to moved background objects, and small repetitive movements such as swaying trees, and flickering screen; foreground aperture due to the difficulty in detecting interior motion of an object with homogeneous color; bootstrapping when background frames for training are not available; camouflage when the foreground is very similar to the background; and shadows cast by the foreground objects. In addition, a complex segmentation algorithm may be difficult to implement for real time operation.
Many approaches have been proposed to segment the foreground moving objects in video sequences. Information of the temporal evolution of pixel values is commonly used. Typically a new frame is subtracted from a reference image and then a threshold is applied to the difference. These approaches differ in the type of background model used and the procedure used to update the model. A comprehensive comparison can be found in K. Toyama, J. Krunmm, et al., Wallflower: Principles and Practice of Background Maintenance, ICCV99, 1999, 255-261.
To make the algorithm robust to a change in illumination or to the background, adaptive background modeling approaches have been proposed. Kalman Filtering based methods are robust to lighting changes in the scene. See, C. Ridder, O. Munkelt, et al., Adaptive Background Estimation and Foreground Detection using Kalman-filtering, Proc. of Intl. Conf. On Recent Advances in Mechatronics (ICRAM), 1995, 193-199. But these approaches recover slowly and do not handle bimodal backgrounds well. A Mixture of Gaussians (MoG) model has been proposed in C. Stauffer, and W. E. L. Grimson, Adaptive Background Mixture Models for Real-time Tracking, CVPR99, 1999, 246-252. This model adapts slowly to a sudden lighting change. Attempts have been made to try to improve the MoG algorithm. See, M. Cristani, M. Bicego, and V. Murino, Integrated Region- and Pixel-based Approach to Background Modeling, IEEE Workshop on Motion and Video Computing, Dec. 2002, 3-8, and P. KaewTraKulPong, and R. Bowden, An Improved Adaptive Background Mixture Model for Real-time Tracking with Shadow Detection, Proc. 2nd European Workshop on Advanced Video Based Surveillance Systems (AVBS01), September 2001. A Hidden Markov Model has been used to describe global state changes. See B. Stenger, V. Remesh, et al., “Topology Free Hidden Markov Models: Application to Background Modeling,” ICCV 2001, pp. 294-310. In another attempt, a Wallflower system is proposed that attempts to solve many of the common problems with background maintenance. See, K. Toyama, J. Krunmm, et al., Wallflower: Principles and Practice of Background Maintenance, ICCV99, 1999, 255-261.
All of the above-described algorithms use the intensity or color information of the pixels. However, the intensity/color based background systems are susceptible to sudden lighting changes. Efforts have been made to incorporate other illumination-robust features for scene modeling. In one attempt, the intensity and texture information is integrated to perform change detection, with the texture-based decision taken over a small neighborhood. See, L. Li, and M. K. H. Leung, Integrating Intensity and Texture Differences for Robust Change Detection, IEEE Trans. on Image Processing, Vol. 11, No. 2, February 2002, 105-112. Another attempt uses the fusion of color and gradient information. See, O. Javed, K. Shafique, and M. Shah, A Hierarchial Approach to Robust Background Subtraction using Color and Gradient Information, Proc. Workshop on Motion and Video Computing, 2002, 22-27. The computation is intensive and may not be suitable for real-time implementation.
Two papers use DCT domain processing for background subtraction and detecting obstructions and tracking moving objects. M. Lamarre and J. J. Clark do background subtraction in the block-DCT domain. (“Background subtraction using competing models in the block-DCT domain.” ICPR2002). The authors use different methods for smooth and sudden transitions scenarios in the background subtraction algorithm. Smooth transitions of the DC coefficients are integrated into the background model with a steady state Kalman filter and sudden changes are detected and analyzed by the Hidden Markov Models (HMM). These models are computed using multiple competing HMMs over small neighborhoods. Background and transition matrix probabilities are estimated empirically. N. Amamoto and A. Fujii extract the moving vehicle by thresholding the mean value of the midband and high AC components of the DCT of the background difference image, and combining this with the predicted object region based on the previous object region and motion vector in “Detecting Obstructions and Tracking Moving Objects byImage Processing Technique,” Electron. And Comm. In Japan, Part 3, Vol. 82, No. 11, 1999, pp. 28-37.
It is desirable to provide an improved method for foreground segmentation of video sequences which is robust to illumination variation and changing background, and also easy to implement for real time applications.