The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
“Biometrics” refers to unique physiological and/or behavioral characteristics of a person that can be measured or identified. Example characteristics include height, weight, shape, fingerprints, retina patterns, skin and hair color, and voice patterns. Identification systems that use biometrics are becoming increasingly important security tools. Identification systems that recognize irises, voices or fingerprints have been developed and are in use. These systems provide highly reliable identification, but require special equipment to read the intended biometric (e.g., fingerprint pad, eye scanner, etc.) Because of the expense of providing special equipment for gathering these types of biometric data, facial recognition systems requiring only a simple video camera for capturing an image of a face have also been developed.
In terms of equipment costs and user-friendliness, facial recognition systems provide many advantages that other biometric identification systems cannot. For example, face recognition does not require direct contact with a user and is achievable from relatively far distances, unlike most other types of biometric techniques, e.g., fingerprint and retina pattern. In addition, face recognition may be combined with other image identification methods that use the same input images. For example, height and weight estimation based on comparison to known reference objects within the visual field may use the same image as face recognition, thereby providing more identification data without any extra equipment.
However, facial recognition systems can have large error rates. In order to provide the most reliable and accurate results, current facial recognition systems typically require a person who is to be identified to stand in a certain position with a consistent facial expression, facing a particular direction, in front of a known background and under optimal lighting conditions. Only by eliminating variations in the environment is it possible for facial recognition systems to reliably identify a person. Without these types of constraints in place, the accuracy rate of a facial recognition system is poor, and therefore facial recognition systems in use today are dedicated systems that are only used for recognition purposes under strictly controlled conditions.
Video surveillance is a common security technology that has been used for many years, and the equipment (i.e., video camera) used to set up a video surveillance system is inexpensive and widely available. A video surveillance system operates in a naturalistic environment, however, where conditions are always changing and variable. A surveillance system may use multiple cameras in a variety of locations, each camera fixed at a different angle, focusing on variable backgrounds and operating under different lighting conditions. Therefore, images from surveillance systems may have various side-view and/or top-view angles taken in many widely varying lighting conditions. Additionally, the expression of the human face varies constantly. Comparing facial images captured at an off-angle and in poor lighting with facial images taken at a direct angle in well lit conditions (i.e., typical images in a reference database) results in a high recognition error rate.
In a controlled environment, such as an entry vestibule with a dedicated facial recognition security camera, the comparison of a target face to a library of authorized faces is a relatively straightforward process. An image of each of the authorized individuals will have been collected using an appropriate pose in a well lighted area. The person requesting entry to the secured facility will be instructed to stand at a certain point relative to the camera, to most closely match the environment in which the images of the authorized people were collected.
For video surveillance systems, however, requiring the target individual to pose is an unrealistic restriction. Most security systems are designed to be unobtrusive, so as not to impede the normal course of business or travel, and would quickly become unusable if each person traveling through an area were required to stop and pose. Furthermore, video surveillance systems frequently use multiple cameras to cover multiple areas and especially multiple entry points to a secure area. Thus, the target image may be obtained under various conditions, and will generally not correspond directly to the pose and orientation of the images in a library of images.
When capturing multiple images of individuals and other “objects,” it is important to group the image sets of a single object with each other. A group of one or more image sets of a particular object is referred to as an “event.” The image sets of an event are used to compare the event with other events in the surveillance system. Before such comparisons are made, one or more images sets of a particular object are captured and must be grouped together to form a single event.
For example, when an individual walks past a security checkpoint, a camera may capture multiple images (e.g. video) of the individual. It is desirable to group all images of the individual at that time in a single event. Also, a second camera at the security checkpoint may capture multiple images of the same individual but from a different angle. It may also be desirable to group all images from the second camera with the images from the first camera.
Determining a group in which to add a new image set and determining when to close a group (i.e. become an event) is difficult given the various conditions in which multiple images are obtained and the possibility of multiple cameras capturing multiple images of the same object.