Video cameras are widely used for surveillance. Video surveillance involves the acquisition of visual information from one or more video cameras and the identification or detection in the acquired visual information of events of interest, e.g. shoplifting or unauthorized entry. Detection of the events of interest can be accomplished either concurrent with video acquisition or later following a period of storage.
A common shortcoming with current video surveillance systems is the poor quality of the surveillance video. In addition, the degree of coverage provided to a given area through video surveillance is often limited due to the expense associated with providing a high degree of coverage, since a high degree of coverage requires more cameras, wiring, storage and monitoring facilities. However, a lower degree of coverage increases the opportunity for events of interest to occur outside of the field of view of the deployed cameras. For example, when the acquired video is needed to investigate a bank robbery, the events of interest may have taken place out of the field of view of the deployed cameras, either by coincidence or design. Even for events of interest that occur within the field of view of the deployed cameras, objects, for example faces of the perpetrators or car license plate numbers, can be too small or indistinct in the video to be readily identified because of the limited visual acuity of the deployed cameras.
In general in a video surveillance application, there is a minimum desired camera resolution that is practical or effective. In face recognition, surveillance and audio-visual speech recognition, for example, sufficiently high resolution images of the face are necessary for recognition to be practical. The area of coverage of such systems is usually limited by the need for resolution, since visual acuity is balanced against coverage area by varying the focal length of the video camera lenses. Therefore, a higher degree of visual acuity, i.e. a sharper image, results in a smaller coverage area and vice versa. Additional coverage can be achieved by adding cameras at additional expense and increased architectural complexity for the system. Ultrahigh resolution cameras with wide angle lenses have been proposed to alleviate the problem of decreased field of view with increased resolution; however, ultrahigh resolution cameras are expensive. In addition, the use of ultrahigh resolution cameras requires the replacement of existing cameras and even some of the ancillary monitoring equipment. The cost associated with installing these nonstandard ultrahigh resolution cameras inhibits their adaptation and installation.
Regardless of whether standard or ultrahigh resolution cameras are used, these cameras are typically fixed in place and provide a single fixed focal length. In many applications, however, the range of scales to be observed are practically unlimited, and a fixed location for events of interest is difficult to predict. Therefore, fixed non-zoom cameras can not provide the same level of functionality as moveable zoom cameras, which can also be high-resolution, for delivering detailed images of events of interest.
One proposed approach to using moveable zoom cameras deploys steerable, i.e. Pan-Tilt, cameras having a variable focal length, i.e. zoom. These types of cameras are known as pan-tilt-zoom (PTZ) cameras and can be moved to point at an area of interest and zoomed or focused to obtain a high-resolution image of an object within that area. This approach, however, is not without limitations. First, in order to aim and focus a camera on an object of interest within an area of interest, the object of interest needs to be identified. In addition, even if the object is identified, that object needs to be located in order to determine where to aim and focus the camera in order to obtain a high-resolution image of the object.
In most applications, the task of identifying objects of interest and the location of these objects is delegated to a human operator, for example a security guard situated in front of a panel of monitors. The security guard selects areas of interest, manually steers a camera to point at those areas, and manually focuses or zooms the camera on one or more objects within those areas. Successful application of this system requires a sufficient number of cameras and monitors to provide coverage of the larger areas of potential interest. If the operator is not available or is not looking at a potential area of interest, then events of interest can be missed. Therefore, attempts have been made to mitigate the limitations association with the use of human camera operators.
Methods have been devised that connect a camera controller to a door switch. When the door is opened, the switch is activated. Activation of the switch makes the camera automatically steer in the direction of the door and focus on the area of the door opening to get a close-up of the any persons passing through the door opening.
In U.S. patent application Ser. No. 10/933,660, filed Sep. 3, 2004, Hampapur et al., describe a video surveillance system that uses sophisticated six-degree-of-freedom calibration of two or more cameras to triangulate the location of objects such as people's heads. The triangulation information is used to direct additional steerable cameras to point at the heads.