Video surveillance is of critical concern in many areas of life. One problem with video as a surveillance tool is that it may be very manually intensive to monitor. Recently, solutions have been proposed to the problems of automated video monitoring in the form of intelligent video surveillance systems. See, for example, U.S. Pat. No. 6,696,945, “Video Tripwire,” and U.S. patent application Ser. No. 09/987,707, “Surveillance System Employing Video Primitives,” both of which are incorporated herein by reference. One application of video surveillance is the detection of human beings and their behaviors. Unfortunately, the science of computer vision, which is behind automated video monitoring, has limitations with respect to recognizing individual targets in overhead camera views, such as those used in residential, commercial, and home monitoring applications.
Current video surveillance systems (see, for example, C. Stauffer, W. E. L. Grimson, “Learning Patterns of Activity Using Real-Time Tracking,” IEEE Trans. PAMI, 22(8):747-757, August 2000; and R. Collins, A. Lipton, H. Fujiyoshi, and T. Kanade, “Algorithms for Cooperative Multisensor Surveillance,” Proceedings of the IEEE, Vol. 89, No. 10, October, 2001, pp. 1456-1477, both of which are incorporated herein by reference) have two basic limitations. First, groups of targets may often be crowded together and detected as a single “blob.” The blob may be correctly labeled as “human group,” but the number of individuals comprising the group may not be ascertained. Second, other inanimate objects, such as, for example, furniture, strollers, and shopping carts, may generally not be disambiguated from legitimate targets (particularly in, for example, overhead camera shots). In addition, other “human detection” algorithms (see, for example, the techniques discussed at and U.S. patent application Ser. No. 11/139,986, “Human Detection and Tracking for Security Applications,” filed May 31, 2005, both of which are incorporated herein by reference) rely on more oblique camera views and specific human models to recognize humans, but generally do not perform well for overhead camera views.