There is an increasing need to provide security, efficiency, comfort, and safety for users of environments, such as buildings. Typically, this is done with sensors. When monitoring an environment with sensors, it is important to have a measure of a global context of the environment to make decisions about how best to deploy limited resources. This global context is important because decisions made based on single sensors, e.g., a single cameras, are necessarily made with incomplete data. Therefore, the decisions are unlikely to be optimal. However, it is difficult to recover the global context using conventional sensors due to equipment cost, installation cost, and privacy concerns.
Some of the sensors can be relatively simple, e.g., motion detectors. Motion detectors can occasionally signal an unusual event with a single bit. Bits from multiple sensors can indicate temporal relationships between the events. Other sensors are more complex. For example, pan-tilt-zoom (PTZ) cameras generate a continuous stream of high-fidelity information about an environment at a very high data rate and computational cost to interpret that data. However, it is impractical to completely cover the entire environment with such complex sensors.
Therefore, it makes sense to install a large number of simple sensors, such as motion detectors, and only a smaller number of complex PTZ cameras. However, it is labor intensive to specify the mapping between a large network of simple sensors and the actions that the system needs to make based on that data, particularly, when the placement of the sensors needs to change over time as the physical structure of the environment is reconfigured.
Therefore, it is desired to dynamically acquire action policies given a hybrid sensor network arranged in an environment, activity of users of the environment, and application specific feedback about the appropriateness of the actions.
In particular, it is desired to optimize expensive and limited resources, the attention of a lone security guard, a single monitoring station, network bandwidth of a video recording system, the placement of elevator cabs in a building, or the utilization of energy for heating, cooling, ventilation or lighting.
Without loss of generality, the invention is concerned particularly with a PTZ camera. The PTZ camera enables a surveillance system to acquire high-fidelity video of events in an environment. However, the PTZ camera must be pointed at locations where interesting events occur. Thus, in this example application, the limited resource is orienting the camera.
When the PTZ camera is pointing at empty space, the resource is wasted. Some PTZ cameras can be pointed manually at an interesting event. However, this assumes that the event has already been detected. Other PTZ cameras aimlessly scan the environment in a repetitive pattern, oblivious to events. In either case, resources are wasted.
It is desired to improve the efficiency of limited, expensive resources, such as PTZ cameras. Specifically, it is desired to automatically point the camera at interesting events based on information acquired from simple sensors in a hybrid sensor network.
Conventionally, a geometric survey of the environment is performed with specialized tools, prior to operating a surveillance system. Another method generates a known or an easy to detect pattern of motion, such as having a person or robot navigate an empty environment following a predetermined path. This geometric calibration can then be used to manually construct an ad hoc rule-based surveillance system.
However, those methods severely constrain the system. It is desired to minimize the constraints on the users and in the environment. By enabling unconstrained motion of the users, it becomes possible to adapt the system to a large variety of environments. In addition, it becomes possible to eliminate the need to repeatedly perform geometric surveys, as the physical structure of the environment is reconfigured over time.
System and methods to configure and calibrate a network of PTZ cameras are known, see Robert T. Collins and Yanghai Tsin, “Calibration of an outdoor active camera system,” IEEE Computer Vision and Pattern Recognition, pp. 528-534, June 1999; Richard I. Hartley, “Self-calibration from multiple views with a rotating camera,” The Third European Conference on Computer Vision, Springer-Verlag, pp. 471-478, 1994; S. N. Sinha and M. Pollefeys, “Towards calibrating a pan-tilt-zoom cameras network,” Peter Sturm, Tomas Svoboda, and Seth Teller, editors, Fifth Workshop on Omnidirectional Vision, Camera Networks and Non-classical cameras, 2004; Chris Stauffer and Kinh Tieu, “Automated multi-camera planar tracking correspondence modeling,” IEEE Computer Vision and Pattern Recognition, pp. 259-266, July 2003; and Gideon P. Stein, “Tracking from multiple view points: DARPA Self-calibration of space and time,” “Image Understanding Workshop,” 1998.
This interest has been enhanced by the DARPA video surveillance and monitoring initiative. Most of that work has focused on classical calibration between the cameras and a fixed coordinate system of the environment.
Another method describes how to calibrate cameras with an overlapping field of view, S. Khan, O. Javed, and M. Shah, “Tracking in uncalibrated cameras with overlapping field of view, IEEE Workshop on Performance Evaluation of Tracking and Surveillance, 2001. There, the objective is to find pair-wise camera field of view borders such that target correspondences in different views can be located, and successful inter-camera ‘hand-off’ can be achieved.
On a more practical side, a camera network with cooperating low and high resolution cameras in a relatively difficult outdoor environment, such as a highway, is described by M. M. Trivedi, A. Prati, and G. Kogut, “Distributed interactive video arrays for event based analysis of incidents,” IEEE International Conference on Intelligent Transportation Systems, pp. 950-956, September 2002.
Other methods combine autonomous systems with structured light, J. Barreto and K. Daniilidis, “Wide area multiple camera calibration and estimation of radial distortion,” Peter Sturm, Tomas Svoboda, and Seth Teller, editors, Fifth Workshop on Omnidirectional Vision, Camera Networks and Non-classical cameras, 2004; use calibration widgets, Patrick Baker and Yiannis Aloimonos, “Calibration of a multicamera network,” Robert Pless, Jose Santos-Victor, and Yasushi Yagi, editors, Fourth Workshop on Omnidirectional Vision, Camera Networks and Nonclassical cameras, 2003; or use surveyed landmarks, Robert T. Collins and Yanghai Tsin, “Calibration of an outdoor active camera system,” IEEE Computer Vision and Pattern Recognition, pp. 528-534, June 1999.
However, most of those methods are impractical because those methods either require too much labor, in the case of calibration tools, or place too many constraints on the environment, in the case of structured light, or require manually surveyed landmarks. In any case, those methods assume that calibration is done prior to operating the system, and make no provision for re-calibrating the system dynamically during operation as the environment is reconfigured.
Those problem are address by Stein and Stauffer et al. They use tracking data to estimate transforms to a common coordinate system for their camera network. They do not distinguish between setup and operational phases. Rather, any tracking data can be used to calibrate, or re-calibrate their system. However, neither of those methods directly addressed the question of PTZ cameras. More importantly, those methods place severe constraints on the sensors used in the network. The sensors acquire very detailed positional data for moving objects, and must also be able to differentiate objects to successfully track the objects. This is true because tracks, and not individual observations, are the basic unit used in their calibration process.
All the methods describe above require the acquisition of a detailed geometric model of the sensor network and the environment.
Another method calibrates a network of non-overlapping cameras, Ali Rahimi, Brian Dunagan, and Trevor Darrell, “Simultaneous calibration and tracking with a network of non-overlapping sensors,” IEEE Vision and Pattern Recognition, pages 187-194, June 2004. However, that method requires the tracking of a moving object.
It is desired to use complex PTZ cameras that are responsive to events detected by simple sensors, such as motion sensors. Specifically, it is desired to observe the events with the PTZ cameras without specialized tracking sensors. Moreover, it is desired to track and detect events generated by multiple users.