Public venues such as shopping centres, parking lots and train stations are increasingly subject to surveillance with large-scale networks of video cameras. Application domains of large-scale video surveillance include security, safety, traffic management and business analytics.
Most commercially available surveillance camera systems rely primarily on static cameras for object detection and tracking. These static cameras generally have wide field-of-view, typically between 80 and 100 degrees, with a resolution limited to about 1920×1080 pixels even for high-end cameras. This limited resolution makes tasks that are sensitive to resolution and image quality, such as face detection, face recognition, forensic examinations, and person re-identification using soft-biometrics, very difficult, especially where subjects are imaged from a distance. This issue has led to the extensive use of pan-tilt-zoom (PTZ) cameras for operator-based security applications as PTZ cameras can obtain close-up imageries of subjects.
One major challenge for large-scale surveillance networks with a limited number of PTZ cameras is camera scheduling, which is the problem of computing an optimal set of camera assignments for satisfying multiple different types of tasks. For surveillance applications, one such task is to capture high quality image of all objects in the environment. For person identification, the tasks also include capturing frontal face images of all human targets. Another task is to detect suspicious target activities by persistently tracking identified targets. Yet another example is to monitor doorways of secure areas.
In the following discussion, the term “camera scheduling” will be understood to include the terms “camera assignment” and “camera planning”.
Camera scheduling is a challenging problem for several reasons. Human motion is highly stochastic in nature. The targets should ideally be captured at a viewing condition suitable for a given task. As an example, for person identification, high quality frontal face images are desired. Alternatively, for behaviour analysis, the profile view of a suspicious target may be preferred. However, public venues are often characterized by crowds of uncooperative targets moving in uncontrolled environments with varying and unpredictable distance, speed and orientation relative to the camera. Furthermore, high object coverage is usually required in a surveillance camera network, where the number of cameras is usually far less than the number of subjects in the scene. Also, depending on the specific application, a surveillance system may be required to satisfy multiple different types of tasks. For example, the system may need to simultaneously capture frontal face images for all targets and persistently track suspicious tagged targets. In addition, the scheduling algorithm may need to operate optimally with varying number of PTZ cameras. For example, it should be easy to add a new PTZ camera to the network due to network reconfiguration.
One known method for camera scheduling assigns cameras to targets sequentially in a round-robin fashion to achieve uniform coverage. This approach tends to have long execution time and a high miss rate.
Another method formulates the scheduling problem for a single PTZ camera as a Kinetic Travelling Salesperson Problem and ranks the targets by estimated deadlines i.e., by when the targets leave the surveillance area. An optimal subset of the targets, which satisfies the deadline constraint, is obtained through an exhaustive search. The main disadvantage of this approach is that it cannot be easily extended to multiple camera scenarios.
The above disadvantage is addressed in another method as a combinatorial search problem by finding a plan that fits into a given time horizon while maximising a heuristically defined reward function based on target-camera distance, frontal viewing direction, and PTZ limits. A greedy best-first-search strategy is used to find a good target-camera assignment in real-time given a pre-defined time budget for planning. However, the above method works more optimally when the time horizon is large and is generally slow to react to new targets.
Another related method solves the camera scheduling problem based on Markov decision processes. This approach attempts to find a situation-action mapping (called policy) that specifies the best action to take for each situation under uncertainty. The framework explicitly models the temporal evolution of states of the targets and designs a policy for action selection based on a reward function. This approach requires setup time for generating a situation-action mapping for a scene and does not work for continuous state space. Moreover, a new situation-action mapping needs to be re-computed every time a PTZ camera is added or removed from a camera network.
Yet another method uses gaze estimates from a static camera to optimise the control of active cameras with the aim of maximising the likelihood of surveillance targets being correctly identified. The method maintains a latent variable, for each target, representing whether the identity of a target is ascertained. Moreover, the method measures the expected information gain with respect to an image gallery from observing each target based on the gaze estimates, field-of-view of the PTZ cameras and the performance of identification process, and subsequently selects the target-camera assignment that gives the largest information gain. This method avoids repeat observations for identified targets, and hence captures more targets in an environment. However, the method requires a person of interest to be known in advance, and accordingly cannot be used to capture faces for offline search.
All of the above camera scheduling methods suffer from at least one of the following disadvantages: 1) identity of targets are required to be known in advance; 2) poor scalability to cope with new targets or new configuration of the camera network; and 3) difficult to extend to meet multiple different types of tasks simultaneously. As noted previously, practical applications thus present unfavourable conditions for known camera scheduling methods.