Automated security and surveillance systems typically employ video cameras or other image capturing devices or sensors to collect image data. In the simplest systems, images represented by the image data are displayed for contemporaneous screening by security personnel and/or recorded for later reference after a security breach. In those systems, the task of detecting objects of interest is performed by a human observer. A significant advance occurs when the system itself is able to perform object detection and tracking, either partly or completely.
In a typical surveillance system, for example, one may be interested in tracking a detected object such as a human, a vehicle, an animal, etc. that moves through the environment. Existing systems capable of tracking detected objects attempt to track objects using motion prediction and tracking of selected features in consecutive frames of video. Other techniques, such as the SIFT method, attempt to precisely represent the appearance of an instance of an object such that the representation can be used to match multiple instances of an object irrespective of their temporal proximity. Known tracking systems, however, suffer from one or more of (1) inability to be trained, (2) lack of integration with object searching, indexing, and classification systems, (3) inadequate object tracking and search performance, and (4) ineffective cross camera tracking capabilities.