Object detection and tracking is an important but difficult task. Many automatic surveillance systems are needed to collect data for detecting objects of interest. The challenges of object detection and tracking, especially at nighttime, are mainly due to the variation, the significance and the high speed of the lighting change. Often, the area of interest is very unevenly illuminated. At places with good lighting, objects can be seen very well by a camera with night mode, except for color loss. However, at place with little or no lighting, objects (e.g., humans) without self-illumination can have very low contrast and objects (e.g. vehicles) with self-illumination can cause drastic change to the entire scene. Thermal cameras measure the surface temperature, therefore are less sensitive to lighting change. However they are expensive and do not capture the appearance information as good as a visible light camera. Therefore in most surveillance applications, a regular visible light camera (possibly with a night mode) is used for both during daytime and nighttime.
Background modeling and subtraction is a widely used approach for moving object detection and tracking. However the fast change of image gain makes it difficult to detect moving objects using the background subtraction. The appearance of the vehicle also changes significantly, thus making appearance-based tracking also less likely to succeed. Most current work of visual tracking performs detection when the object enters the scene and then performs tracking. In difficult scenario like this, the detection from a single or a small number of frames is less likely to be accurate and erroneous detection will in turn ruin the tracking.
Many techniques have been proposed for nighttime detection and tracking. Most of them make assumptions about the images, such as “hot-spot” assumption, or direct extension of the daytime algorithms. For example, the approaches in many technical papers exploit the thermal imagery property of human bodies which are hotter (or brighter) than the surrounding environment. Such papers include B. Bhanu and J. Han in “Kinematic based human motion analysis in infrared sequences”, In Proceedings of the Sixth IEEE Workshop on Applications of Computer Vision, pages 208-212, F. Xu, X. Liu, and K. Fujimura in “Pedestrian detection and tracking with night vision”, IEEE Transaction on Intelligent Transportation Systems, vol. 6, no. 1, pages 63-71, A. Yilmaz, K. Shafique, and M. Shah in “Tracking in airborne forward looking infrared imagery”, Image and Vision Computing, vol. 21, no. 7, pages 623-635 and H. Nanda and L. Davis in “Probabilistic template based pedestrian detection in infrared videos”, IEEE Intelligent Vehicles Symposium. 
The common problem with the “hot-spot” assumption in the above-mentioned papers is that it is not always true in complex environment due to the temperature changes across the day or seasons of the year. To make the algorithm more reliable, various methods are proposed, including the use of support vector machines or other learning based methods such as robust background subtraction with contour completion/closing, fusion-based background subtraction using the contour saliency, the application of probabilistic templates and the W4 system employing a combination of shape analysis. However most of these approaches are for thermal images and not directly applicable to visible light cameras.
In most visual tracking work, such as the mean shift tracker as disclosed by G. Bradski in “Computer vision face tracking for use in a perceptual user interface”, Intel Technology Journal, vol. 2, no. 2 and by D. Comaniciu, V. Ramesh, and P. Meer in “Kernel-based object tracking” IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 5, pages 564-577, the particle filter based tracker by M. Isard and A. Blake in “Condensation—conditional density propagation for visual tracking”, Int'l Journal of Computer Vision, vol. 29, no. 1, pages 5-28, and the Markov chain Monte Carlo method for object tracking by T. Zhao and R. Nevatia, “Tracking multiple humans in complex situations”, IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, no. 9, pages 1208-1221, the temporal coherence information is only used in a two-frame fashion. All these methods utilize the information with spatial priority and only exploit the temporal coherence between two adjacent frames. P. Komprobst and G. G. Medioni in “Tracking segmented objects using tensor voting”, In Proc. IEEE Conf. Comp. Vision Pattern Recognition, pages 2118-2125 used the tensor voting framework to group object trajectories in the spatio-temporal space. However it assumes that objects can be detected fairly accurately and the centroids are being grouped. Compared to smoothness constraint in tensor voting, the small field of view needs stronger model for robust result.
The most related work in detecting and tracking objects is the video retrieval using the spatio-temporal descriptors is disclosed by D. DeMenthon and D. Doermann in “Video retrieval of near-duplicates using k-nearest neighbor retrieval of spatiotemporal descriptors”, Multimedia Tools and Applications, page in press, where spatio-temporal event volumes are extracted using a hierarchical mean shift algorithm in 7D space. However, this approach suffers from high computational complexity because it uses a 7-D approach.
So, a need exists in the art to built an improved system which overcomes the deficiencies of prior art and provides an accurate and robust detection of a moving object in real-time. There is further need in the art for the system which is insensitive to transient noise, natural handling of occlusion and convenient enforcement of physical constraints.