The current heightened sense of security and declining cost of camera equipment have increased the use of closed-circuit television (CCTV) surveillance systems. Such systems have the potential to reduce crime, prevent accidents, and generally increase security in a wide variety of environments.
As the number of cameras in a surveillance system increases, the amount of information to be processed and analyzed also increases. Computer technology has helped alleviate this raw data-processing task, resulting in a new breed of monitoring device—the computer-aided surveillance (CAS) system. CAS technology has been developed for various applications. For example, the military has used computer-aided image processing to provide automated targeting and other assistance to fighter pilots and other personnel. In addition, CAS has been applied to monitor activity in environments such as swimming pools, stores, and parking lots.
A CAS system monitors “objects” (e.g., people, inventory, etc.) as they appear in a series of surveillance video frames. One particularly useful monitoring task is tracking the movements of objects in a monitored area. To achieve more accurate tracking information, the CAS system can utilize knowledge about the basic elements of the images depicted in the series of video frames.
A simple surveillance system uses a single camera connected to a display device. More complex systems can have multiple cameras and/or multiple displays. The type of security display often used in retail stores and warehouses, for example, periodically switches the video feed displayed on a single monitor to provide different views of the property. Higher-security installations such as prisons and military installations use a bank of video displays, each showing the output of an associated camera. Because most retail stores, casinos, and airports are quite large, many cameras are required to sufficiently cover the entire area of interest. In addition, even under ideal conditions, single-camera tracking systems generally lose track of monitored objects that leave the field-of-view of the camera.
To avoid overloading human attendants with visual information, the display consoles for many of these systems generally display only a subset of all the available video data feeds. As such, many systems rely on the attendant's knowledge of the floor plan and/or typical visitor activities to decide which of the available video data feeds to display.
Unfortunately, developing a knowledge of a location's layout, typical visitor behavior, and the spatial relationships among the various cameras imposes a training and cost barrier that can be significant. Without intimate knowledge of the store layout, camera positions and typical traffic patterns, an attendant cannot effectively anticipate which camera or cameras will provide the best view, resulting in a disjointed and often incomplete visual records. Furthermore, video data to be used as evidence of illegal or suspicious activities (e.g., intruders, potential shoplifters, etc.) must meet additional authentication, continuity and documentation criteria to be relied upon in legal proceedings. Often criminal activities can span the fields-of-view of multiple cameras, and possibly be out of view of any camera for some period of time. Video that is not properly annotated with date, time, and location information, and which includes temporal or spatial interruptions may, not be reliable as evidence of an event or crime.