In video surveillance systems it is often useful to summarize the collected surveillance video of people in the monitored scene by images of the faces visible in the original video sequence. The sequence of faces images is referred to as face image logs. Whether reviewed by security personnel, or an automated system; processed in real-time, or upon request; these logs allow investigators to determine who was in the vicinity of the surveillance camera at any particular moment in time without having to view the video sequence itself.
In general, face image logs need to be complete in the sense that they should contain, at the very least, one high quality image for each individual whose face appeared unobstructed in the original video. High quality images are important because they maximize the probability that an individual will be correctly identified.
The most direct approach to constructing complete face image logs involves using existing face detection technologies to extract face images directly from video sequences, and immediately appending each of these detections to the face log. In this scenario, one face may be detected per person per frame. Surveillance footage, captured at 15 frames per second, could potentially capture 900 face images per person per minute. The high rate of detections could easily overwhelm any human operator or automated biometric face recognition system that might be trying to process the face image log in real time. Real-time or not, much of this processing is wasteful since each individual may appear in the log numerous times.
Face image validation systems have been used for analysis of images of faces to determine if they are suitable for use in identification documents such as passports but have been limited in their applicability to real-time applications. While the technique does provide a numeric quality score for input images, it is mainly for detecting images that do not meet the criteria established by the International Civil Aviation Organization; and their decisions are inherently binary. In addition the face images are acquired in a relatively controlled manner with on a limited range of variation in subject conditions. In contrast, a more continuous appraisal system is required when selecting high quality face images from video sequences.
Accordingly, there is a need for quality appraisal of face images for the purpose of selecting high quality faces from video sequences.
It will be noted that throughout the appended drawings, like features are identified by like reference numerals.