Public venues such as shopping centres, parking lots and train stations are increasingly subject to surveillance using large-scale networks of video cameras. Application domains of large-scale video surveillance include security, safety, traffic management and business analytics. In one example application from the security domain, a security officer may want to view any video feed containing a particular suspicious person in order to identify undesirable activities. In another example from the business analytics domain, a shopping centre may wish to track customers across multiple cameras in order to build a profile of shopping habits.
A task in video surveillance is rapid and robust object matching across multiple camera views. In one example, called “hand-off”, object matching is applied to persistently track multiple objects across a first and second camera with overlapping fields of view. In another example, called “re-identification”, object matching is applied to locate a specific object of interest across multiple cameras in a network with non-overlapping fields of view. In the following discussion, the term “object matching” will be understood to refer to “hand-off”, “re-identification”, “object identification” and “object recognition”.
Robust object matching is difficult for several reasons. Firstly, many objects may have similar appearance, such as a crowd of commuters on public transport wearing similar business attire. Furthermore, the viewpoint (i.e. the orientation and distance of an object in the field of view of a camera) can vary significantly between cameras in the network. Finally, lighting, shadows and other photometric properties including focus, contrast, brightness and white balance can vary significantly between cameras and locations. In one example, a single network may simultaneously include outdoor cameras viewing objects in bright daylight, and indoor cameras viewing objects under artificial lighting. Photometric variations may be exacerbated when cameras are configured to use automatic focus, gain, exposure and white balance settings.
One object matching method extracts an “appearance signature” for each object and uses the model to determine a similarity between different objects. Throughout this description, the term “appearance signature” refers to a set of values summarizing the appearance of an object or region of an image, and will be understood to include within its scope the terms “appearance model”, “feature descriptor” and “feature vector”.
One method of appearance-based object re-identification models the appearance of an object as a vector of low-level features based on colour, texture and shape. The features are extracted from an exemplary image of the object in a vertical region around the head and shoulders of the object. Re-identification is based in part on determining an appearance dissimilarity score based on the ‘Bhattacharyya distance’ between feature vectors extracted from images of candidate objects and the object of interest. The object of interest is matched to a candidate with the lowest dissimilarity score. However, the appearance dissimilarity may be large for the same object viewed under different photometric conditions.
In one method for appearance matching under photometric variations, a region of interest in an image is divided into a grid of cells, and the average intensity, horizontal intensity gradient and vertical intensity gradient are determined over all pixels in each cell. For each pair of cells, binary tests are performed to determine which cell has greater average intensity and gradients. The test results over all cell pairs are concatenated into a binary string that represents the appearance signature of the image region. A region of interest is compared to a candidate region by determining the Hamming distance between respective appearance signatures of the regions. However, the average intensity and gradients are not very descriptive of the distribution of pixels values within a region. Further, binary differences are sensitive to noise in homogeneous regions, and do not characterize the magnitude of the difference between pairs of regions.
Another method for appearance matching under photometric variations relies in part on determining self-similarity. In this self-similarity method, the central patch of a region of interest is correlated with a dense sampling of patches over the entire region. The resulting correlation surface is spatially quantized into a small number of representative correlation values that represent the appearance signature. A region of interest is compared to a candidate region by determining the sum of differences between respective appearance signatures of the regions. This self-similarity method characterizes the geometric shape of a region independently of photometric properties. However, this self-similarity method may not discriminate different objects with similar shape, such as people. Further, this self-similarity method may not match articulated objects under large changes in shape.
Another method for modelling appearance under photometric variations is used to classify image regions as objects or background in thermal infrared images. A region of interest is divided into a regular grid of cells, and average pixel intensity is determined for each cell. The pairwise average intensity difference between each cell and a predetermined representative cell are concatenated to determine an appearance signature. A binary classifier is trained to discriminate objects from background using appearance signatures from a training set of labelled regions. However, the determined appearance signature is sensitive to the unpredictable content of the predetermined reference cell and to changes in overall contrast in the region.