Currently, in many surveillance applications, such as store loss prevention or site security, a human operator relies on pan-tilt-zoom cameras to monitor a wide area with relatively few cameras. Because of the tedious nature of video monitoring, some approaches seek to automate some or all of the monitoring. For example, some approaches can generate an alert when a predefined event, such as motion in a region or movement of an object into a region (e.g., crossing over a tripwire), is detected in a video stream.
Typically, the alert is defined using image coordinates of a camera, e.g., by defining a region of interest within images acquired by the camera or by defining a line (e.g., tripwire) in the acquired images. However, such a definition does not accommodate a change in the physical location captured in the acquired images, e.g., by movement of the camera. As a result, an alert may not be triggered and/or false alerts may be triggered when the field of view of the camera is altered.
In some surveillance approaches, multiple cameras are used together to track an individual and/or other object. In one approach, a hierarchical approach is used in which a single stationary camera monitors a large area, while dynamic cameras are used to obtain clear images of areas/objects of interest. In another approach, the tracking of an object within the field of view of one camera is used to send adjustments to another camera for which the object is expected to enter its field of view.
In other video applications, such as broadcasting video of sporting events, a physical location, such as a region in which an advertisement is inserted into the video or an indication of a first down on a football field, is tracked as a video camera is moved to follow action on the field. In these applications, a region is defined by one or more landmarks in advance of tracking the region in real time. Further, the region can comprise a unique color to assist with its tracking and/or determining whether any occlusions may be present. Still further, camera sensor data (e.g., pan, tilt, zoom) has been used to assist in locating the region in video.