Many recognition algorithms exist for detecting the presence of a person. In some cases only the face is detected; in other instances the person's whole body is detected. These detection algorithms can be used in a number of applications, such as, but not limited to, driver monitoring, user identification and for providing information about occupants of a vehicle. In the later instance, occupant detection and information about the occupant can be used to control aspects of a vehicle such as air bag deployment.
If the presence of an occupant can be detected, then an initial determination can be made as to whether an air bag needs to be deployed. An unoccupied seat does not require air bag deployment. Furthermore, if the size of the occupant and the position of the occupant relative to the dashboard can be ascertained, then the settings of the air bag can be adjusted to provide the appropriate amount of protection for the occupant. For example, if the occupant is a child or someone sitting close to the dashboard, the air bag can be adjusted to be deployed with a lower amount of force than for an adult or occupant sitting a reasonable distance from the dashboard.
In the case of driver monitoring, another useful application is the determination of the head pose of an occupant of a vehicle such as a driver. Information pertaining to head pose can be used to make assumptions regarding the state of the driver. For example, if the head is tilted, it may indicate that the driver is asleep or otherwise incapacitated. Or, if the head is turned, it can indicate that the driver is not being attentive to the road. Head pose estimation deals with finding the pose of the human head from one or more video streams. The problem is connected to face detection because if the location of the face is known, it becomes simpler to determine the pose. Likewise, if the pose is known, it is easier to detect the face. When two or more cameras are used, the fusion of information coming from different video streams represents a key source for head pose estimation.
The most common approaches for estimating head pose rely on feature point based methods, multi-view methods and three-dimensional (3D) modeling of the face. Feature point based method, also sometimes referred to as feature-based geometrical methods, try to locate landmark points on a face and compute the orientation of the face from the point locations. If a generic face model is assumed for all faces and face symmetry and ratio consistency are taken into consideration, pose can be estimated with only two-dimensional (2D) feature point information. Some methods use points near the eyes and mouth and the coplanar constraint of the points to estimate the pose. Another method uses only points from the areas near the eyes and a point on the nose tip.
Another approach involves multiple cameras to compute 3D landmark positions. The key problem is to accurately locate the feature points so as to minimize the correspondence error to build the accurate 3D positions. The feature point detection can be done with template matching, wavelet analysis or more involved detectors. While use of these methods allow for continuous pose estimation, the feature points are difficult to locate which can be compounded if a person's face is changing expressions. In addition, many times the techniques require manual initialization of the feature points.
Another approach commonly used to determine head pose is a multi-view based method. Multi-view based methods, which are also called appearance learning based methods, treat the whole face as a vector in a high-dimensional space and do not need to find feature points. Training face examples are collected in discrete views, for example, 0, 10, 20 yaw degrees, etc. Each view or pose has faces from different people and under different illumination conditions. The task is then simplified to a view classification problem. Various appearance learning methods can be used to discriminate the different views.
In some cases, a Support Vector Machine (SVM) is employed to classify the view. Other methods involve subspace transformations. In solving face recognition with wide pose changes, one approach detects the pose of the face first and then uses per-view eigenspaces to recognize the face. Principal Component Analysis (PCA) is applied to each view to get a per view eigenspace and then the incoming image is projected to each view's eigenspace and the view with the least residual error is taken as the estimated pose of the incoming face. Generally multi-view approaches do not require stereo information. However, these learning methods need a large training database with many pose-labeled examples.
Three-dimensional model based methods, also referred to as analysis by synthesis, normally assume a generic 3D face/head model. The incoming image is matched to the model in an iterative way to minimize matching error, while the model may or may not be adapted to the specific person. When the procedure converges, the pose is solved. Three-dimensional model based methods typically use many feature points. These methods also assume that the measured point locations are noisy when matching to the model iteratively, and therefore the methods are more robust. When the fitting procedure converges, the results are accurate. However these methods are computationally expensive and very slow and they can completely miss the target without good initialization.