Companies or governments often employ surveillance systems to monitor activities within their territories and to enhance security. One topic of surveillance deals with detection of a suspicious object and re-identification of the object from a pool of objects. For example, in a garage building, a surveillance system may detect a suspicious individual leaving at a particular moment and then identify the individual from a number of known people.
An intelligent surveillance system relies on computer technologies to detect and identify objects. An intelligent surveillance system generally includes video acquisition means, such as video cameras, to acquire images of objects, and a computer system to process images and match objects. To build the pool of objects, the video acquisition means can capture the images of all objects of interest and provide the images to the computer system. The computer processes the images and stores the images in certain formats in a database. When the video acquisition means acquires an image of a suspicious object, the computer system processes the image and tries to match it with the image(s) of an object in the database.
The images of the same object at different moments or in different settings can be substantially different, because the posture of the object and environmental factors such as lighting may vary. Therefore, attempting to match two images of the same object from two moments or two different settings by comparing every detail of the images is not only a prohibitive task for the computer, but can generate unreliable results. Accordingly, a computer in a surveillance system generally reduces and abstracts the available information in an image using statistical models. As an example, a histogram can be used to depict a distribution of data over a range of a variable parameter. More specifically, the range of the parameter can be divided into smaller ranges or subranges, and the histogram can include multiple bins, each bin having a width equal to the size of a corresponding subrange and a height corresponding to the portion of the data falling into the corresponding subrange. For example, a color image can be represented with multiple color channels, such as red, yellow, and green channels, each containing a specific color component of the image. A histogram of the color image can show the distribution of pixels in a specific color channel, with the width of a bin corresponding to a subrange of the specific color channel and the height of a bin corresponding to the number of pixels with the colors within the respective subrange. Re-identification then involves comparing the histograms of two objects. If the histograms of two objects match, the two objects may be considered to match each other.
The more information the computer has about the pool of objects, the more accurate the re-identification process is. A surveillance system can provide multiple images of each of the objects in the pool, through a video camera shooting multiple images in a row and/or through multiple cameras capturing multiple images simultaneously. Thus, the computer can generate multiple histograms for each object using the multiple images thereof. In re-identifying a suspicious object, the computer may compare the histogram of the suspicious object against all of the histograms of an object in the pool, thereby improving accuracy. Alternatively, the computer may combine the multiple images of an object to create an appearance model to represent the object. In re-identifying a suspicious object, the computer determines if the image of the suspicious object fits in the appearance model of an object in the pool.
Researchers have proposed different appearance models for different purposes. For example, in “Global Color Model Based Object Matching in the Multi-Camera Environment,” Proceedings of the 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, Oct. 9-15, 2006, pp. 2644-49, Morioka et al. proposed an adaptive model that combines multiple histograms of an object and reduces the dimension of the histograms by using principal component analysis. Thus, information from all the histograms is integrated. This approach, however, may experience difficulty when the view of an object is obstructed in an image.
In “Appearance models for occlusion handling,” Image and Vision Computing, 24(11), 1233-43 (2006), Senior et al. proposed to use statistics of each pixel to create the appearance model. This approach may be inefficient in correlating two images when the object has changed posture. In “Distinctive Image Features from Scale-Invariant Keypoints,” International Journal of Computer Vision, 60(2), 91-110, 2004, Lowe described a method called SIFT (Scale Invariant Feature Transform) to extract representative interest data points that tend to be invariant when the environment changes. However, the amount of computation required by the SIFT approach may become impractical as the number of images grows large.
Others have proposed to divide an object into multiple parts and to create an appearance model for each part. However, identifying the respective parts across multiple images is a challenging task for a computer.