1. Technical Field
The present invention is generally directed to a system and method for assuring high resolution imaging of distinctive characteristics of a moving object. More specifically, the present invention is directed to a system and method for assuring high resolution imaging of faces of persons passing through a targeted space.
2. Description of Related Art
In many security applications, high resolution images and video of certain objects are desired for robust object identification. In general, the known systems employ only wide-angle cameras to monitor a scene for which the detailed view of a certain location or an object in the scene cannot be generated. In only few cases, pan-tilt-zoom (PTZ or active) cameras are used to get higher resolution views of the interesting parts of a scene.
U.S. patent application 20030122667, entitled “System and Method for Enhancing Security at a Self-Checkout Station” (Flynn, S. W.) employs PTZ cameras in a supermarket to focus on the self-checkout stations where high-priority non-visual alerts are generated. Upon request, the application zooms to a predefined location for each checkout station. Hence, even a slight offset in the customer location from the assumed coordinates will prevent the system from capturing the customer in the high resolution image.
In U.S. patent application 20020063711, entitled “Camera System with High Resolution Image Inside a Wide Angle View” (Park, M. C. and Ripley, G. D.) a PTZ camera is used to highlight a high resolution image of an area in a panoramic view, which is generated by multiple single-lens cameras. The system lacks automatic detection of interesting segments and needs a manual specification of the area on which to focus. Hence, this system's use is labor-intensive.
In U.S. patent application 20020030741, entitled “Method and Apparatus for Object Surveillance with a Movable Camera” (Broemmelsiek, R. M.) an active camera maintains the object in the center of the field. This is mainly a tracking system where pan and tilt commands are executed for the lateral movements of the person, whereas the zoom value is adjusted when the object moves towards or away from the camera. Broemmelsiek's system adjusts the zoom value to keep the object size small enough so that the object can be tracked reliably with a minimum number of pan and tilt commands.
In some systems, salient color features of the object are used for detection. The technical report by S. Stillman, R. Tanawongsuwan, and I. Essa, entitled “A System for Tracking and Recognizing Multiple People with Multiple Cameras,” Georgia Tech Technical Report#GIT-GVU-98-25, Aug. 1998 discloses the use of two wide-angle cameras to watch the global view of a scene while two PTZ cameras get higher resolution images of two people in the scene. The proposed system operates by first detecting skin (flesh) color pixels in the image data of single-lens cameras. After that, connected skin color regions are found by morphological operators and evaluated by shape and size constraints so that two skin colored regions with the highest face likelihood values are retained. Each PTZ camera is assigned to one distinct skin region and zooms in to capture high resolution image of the respective region. The system employs a face recognition engine, Face-It developer kit of Identix, to verify if a skin color blob corresponds to one of the pre-registered faces in the database. Similar to the system developed by Stillman et al., U.S. patent application 20030142209 (Yamazaki, S. and Tanibuchi, K.) also considers flesh color as an indicator of a face and captures high resolution views of flesh color regions by PTZ cameras.
Although skin (flesh) color is one of the necessary features of a face region, it is not a sufficient condition because visible non-face human body parts are indistinguishable from the face by only skin color. Furthermore, there may be skin colored objects, such as wooden furniture or doors in the environment which results in an increase in false alarms. Finally, although it is a very useful feature in computer vision, color is known to be highly sensitive to the illumination direction, intensity, reflection properties of surfaces, atmospheric conditions, and many other imaging and environmental factors.
Because of the motion of the active camera during the execution of pan, tilt, and zoom commands and possible object motion, high resolution images captured by a PTZ camera may be contaminated with motion blur. The implication of this is that although the system assumes the resolution of the captured image is high enough, the quality of the image may not be sufficient for certain applications as well as visual inspection. Some of the systems developed for license plate reading share similar concerns about the quality of the images, which may be blurred due to fast vehicle motion. An example of such a license plate reading system is described in U.S. patent application 20020186148, entitled “Combined Laser/Radar-Video Speed Violation Detector for Law Enforcement” (Trajkovic, M. et al.). In this system, active cameras are employed for image enhancement. Another license plate imaging system is described in U.S. patent application 20030174865, entitled “Vehicle License Plate Imaging and Reading System for Day and Night” (Vernon, M. W.) where the affects of illumination for day and night vision are taken into account when adjusting camera parameters. U.S. Pat. No. 6,433,706, entitled “License Plate Surveillance System” (Anderson III et al.) is yet another license plate reading system. This license plate reading system employs a camera that is mounted on a moving vehicle. None of these systems adequately address the problems associated with blurring due to movement of the active camera.
In the system of U.S. Pat. No. 6,700,487, entitled “Method and Apparatus to Select the Best Video Frame to Transmit to a Remote Station for CCTV Based Residential Security Monitoring” (Lyons et al.), a frame per event is detected and sent to the monitoring site to check for false alarms. The system deals only with static cameras and thus, the problems associated with the active cameras are not investigated.
U.S. patent application 20030068100, entitled “Automatic Selection of a Visual Image or Images from a Collection of Visual Images, Based on an Evaluation of the Quality of the Visual Images” (Covell et al.), proposes a quality evaluation scheme. With this quality evaluation scheme, for face images, a feature point analysis, such as the openness of both eyes, and a color-based flesh tone analysis are recommended. Camera motion is considered as a cue for the start of something interesting.
Patents that find solutions for key frame extraction from video, such as U.S. Pat. No. 6,252,975, entitled “Method and System for Real Time Feature Based Motion Analysis for Key Frame Selection from a Video” (Bozdagi et al.) and U.S. Pat. No. 6,393,054 “System and Method for Automatically Detecting Shot Boundary and Key Frame from a Compressed Video Data” (Altunbasak et al.), solve a different type of problem where the key frames represent the content changes in the frame sequences that are of comparable quality.
Known camera systems do not provide a robust camera system that assures a high resolution image of an object passing through a targeted space. The known systems suffer from various problems noted above that may cause the resulting images obtained from the camera system to have a resolution that is less than optimum for visual inspection or use with certain applications. Therefore, it would be beneficial to have an improved image capture system for assuring high resolution images of objects passing through a targeted space. cl SUMMARY OF THE INVENTION
The present invention provides a system and method for assuring a high resolution image of an object, such as the face of a person, passing through a targeted space. The present invention makes use of stationary and active or pan-tilt-zoom cameras. In one exemplary embodiment, the system comprises at least one stationary camera and a plurality of active cameras. The at least one stationary camera acts as a trigger point such that when a person passes through a predefined targeted area of the at least one stationary camera, the system is triggered for object imaging and tracking. Upon the occurrence of a triggering event in the system, e.g., a person traveling through the predefined targeted area, the system predicts the motion of the person based on differences in frames of images obtained from the stationary camera. Other triggering events may be detected using one or more visual, infra-red, mechanical, and/or magnetic sensors.
Based on the predicted motion of the person, a position of the person at a future time may be predicted. Based on this predicted position of the person, an active camera that is capable of obtaining an image of the predicted position is selected and may be controlled to focus its image capture area on the predicted position of the person. The active cameras may then perform face detection on images captured from the predicted position of the person. This process may be repeated continuously while the person is in the targeted area. In addition, an analysis of the frame-by-frame discrepancies of the active cameras may be utilized to aid in centering the object in their image capture areas.
After the active camera control and image capture processes, the system evaluates the quality of the captured face images and reports the result to the security agents and interacts with the user. The quality of the captured face images may be evaluated using any number of different algorithms. In one preferred embodiment, the quality of the captured face images is determined by comparing neighboring pixels over the entire image. If there are no large discrepancies between neighboring pixels overall, e.g., discrepancies that are greater than one or more predetermined thresholds, then the image is determined to not be a good quality image since blurring of the image is most likely present such that edges between features are not discernable. In another preferred embodiment, the quality of the image may be determined by taking the values for the pixels of every even (or odd) frame of the captured images and then attempting to predict the values for the pixels in the odd (or even) frame of the captured images. If the discrepancies between the predicted frame pixel values and the actual captured frame pixel values is greater than one or more predetermined thresholds, then the original image is not a good quality image.
The results of the quality analysis of the captured images may be provided to a user or security personnel as feedback to inform them if additional action is necessary. For example, in a security checkpoint application, the feedback from the present invention may be utilized to inform the security personnel that additional action is necessary in order to make sure that a good quality image of a person passing through the checkpoint is obtained. This may involve asking the person to stand and face one of the cameras so that their image may be captured.
The present invention solves the problems of the known systems by providing an object position prediction aspect to active camera imaging. That is, because stationary cameras are used to determine the motion of the object through the targeted space, a predicted position of the object is determined so that the active cameras can be controlled to train their image capture areas on the predicted position of the object. This means that the active camera is moved to the correct orientation prior to the object actually being in the predicted position. As a result, the active camera will be at rest when the object arrives at the predicted position and there is less likelihood of blurring due to the movement of the active camera.
In addition, because the present invention uses an image quality evaluation engine to evaluate the images that are captured during the actual image capturing operations, a real-time determination may be made as to whether additional action is necessary to obtain a good quality image of a particular object. As a result, the object may be placed in a position where a good quality image is assured to be captured. This solves the problem of the known systems in which image quality analysis may be performed long after the actual images are obtained and long after the objects are no longer available to obtain images of.
These and other features and advantages of the present invention will be described in, or will become apparent to those of ordinary skill in the art in view of, the following detailed description of the exemplary embodiments.