Large surveillance networks that are deployed on buildings, highways, trains, metro stations, etc., integrate a large number of cameras, sensors, and information. Human operators typically cannot adequately control and monitor all the cameras within a large surveillance system. As such, many prior art approaches involve object detection and tracking techniques to identify and analyze events occurring within a camera field of view. However, when it comes to searching through large amounts of video data in an effort to identify an event within video image data, it is difficult to obtain reliable results.
For example, consider a surveillance camera that is monitoring a long-term parking lot. The parking lot attendant receives a complaint that a car has been vandalized at some point in the past month. The prior art requires either a manual review of tapes/files from the video camera for the entire month, or the use of a query box drawn around the particular parking spot with the surveillance system retrieving all movement that occurred in the query box. The first approach is typically ineffective because an operator or group of operators must review hundreds of hours of video to observe an event that may have lasted a few seconds. The second approach uses automatic video object tracking and meta-data indexing using a standard relational database to support spatial queries. However, the drawback of this approach is that the representation of the meta-data is very voluminous and makes the indexing of large numbers of cameras impractical due to the heavy volume of network traffic and the size of database tables created.