Public venues such as shopping centres, parking lots and train stations are increasingly subjected to video surveillance with large-scale networks of video cameras. Application domains of large-scale video surveillance include security, safety, traffic management and business analytics. One example application has a pan, tilt and zoom camera, Camera A, tracking a query object on site. When the query object is about to move out of the physical viewing limit of Camera A, another camera, Camera B, in the same network is assigned responsibility to take over tracking the object. The change in responsibility from Camera A to Camera B is often referred to as a “handoff” process. The handoff process usually happens between cameras with overlapping field of view. In handoff, rapid object matching is performed given images of the objects from the two camera views.
Object matching from different camera viewpoints (or views) is difficult. Different cameras operate on different lighting conditions. Different objects may have similar visual appearance, and the same object (e.g., a person or a subject) can have different pose and posture across viewpoints.
One image processing method performs appearance-based object matching. The appearance-based object matching involves first determining visual features of a query object from a first view, then determining the same type of visual features of a candidate object from a second view. The difference between the visual features is then compared. If the difference is smaller than a threshold, the query object and the candidate object are said to match. Otherwise, the query object and the candidate object do not match.
Since different objects may have similar visual appearance, another image processing method skips appearance-based object matching altogether and opts for location-based object matching. Under the assumption of fully calibrated cameras and people moving on a planar ground, the feet location of each person as seen by a camera may be converted to a unique two dimensional (2D) global coordinate on the ground. If two people from two different cameras have the same feet coordinate on the ground, the two people are said to match. However, for the ground coordinate method to work, all cameras must be calibrated which is a non-trivial task especially for a large camera network that requires fast deployment. The assumption of a planar ground is also not applicable to many outdoor environments, where steps, stairs, and uneven terrain are present.
To avoid the need for camera calibration and planar ground assumption, another image processing method uses a location co-occurrence table to determine corresponding locations across two camera views. Two objects match if the locations of the objects in each view co-occur with high probability in the location co-occurrence table. The location co-occurrence table can be learnt from synchronised videos captured by two cameras with overlapping fields of view. For each camera view, foreground objects are segmented using background subtraction. The foreground masks are then quantised into cells. The location co-occurrence table is built for N1 cells in camera view 1 and N2 cells in camera view 2. The location co-occurrence table is initialised as an N1×N2 array of zero (0) values. For each synchronised frame with foreground objects, a pair of cross-camera foreground cells at location l1 in camera view 1 and location l2 in camera view 2 will contribute one count to the (l1, l2) entry of the location co-occurrence table. The accumulation continues over multiple video frames with multiple foreground objects at different locations in the camera views. The corresponding locations across two views get high counts while the non-corresponding locations have negligible counts. The co-occurred location counts can be normalised by the total count over the whole table to serve as a probability of location co-occurrence. The location co-occurrence table can be learnt during live camera operation, reducing the need for camera network calibration. However, the resolution of the lookup locations is limited due to foreground image quantisation. Co-occurred locations also do not enable matching when people walk close to each other in a group or when two people cross path.
Thus, a need exists for an improved method of matching cross-camera moving targets.