The present invention relates to a system for determining poses of a test subject. More particularly, it relates to determining a direction the head of a driver of an automobile is facing.
Operator error is a principal cause of automobile accidents. Different methods have been developed to alert drivers of dangerous conditions in order to prevent errors. As part of such systems, the driver is monitored to determine the driver""s state, and whether the driver is paying proper attention to conditions of the road and vehicle. Typically, this is done by including a camera in the vehicle to take a picture of the driver. The picture is then analyzed to determine features and characteristics of the driver, such as whether the eyes are open and which way the driver is looking. The difficulty is in developing a system for analyzing the picture of the driver which operates with different drivers, different lighting conditions, and different driver positions.
Various techniques have been developed for analyzing images of a driver. In order to be useful, a technique must be quick and economical. It must be analyze a picture quickly in order to alert the driver of the dangerous condition and allow time for an appropriate response. In addition, it must be economical to build so that it is affordable within an automobile Techniques which use large amounts of memory or require high speed processors are not sufficiently economical.
In a feature extraction technique, the image is analyzed to determine specific facial features of the driver. Generally, an objective is to determine the size, shape, and position of the eyes. Features are abstracted from the image based upon the light and dark areas in the image. However, with different drivers and lighting conditions, the light and dark areas, particularly around the eyes, can vary greatly. Thus, determining the feature is difficult and requires significant processing. An example of a feature determining and tracking system is illustrated in xe2x80x9cRecursive Estimation of Structure and Motion Using the Relative Orientation Constraintxe2x80x9d, Proceedings IEEE CVPR, 1993, by Messers. Azarbayejani Horowitz, and Pentland.
In another technique, templates representing face or feature structures are used in determining head position and orientation. The image is compared with the various templates to determine which one is closest. An example of a template matching system is illustrated in xe2x80x9cTracking Facial Motionxe2x80x9d, Proceedings of the-Workshop on Motion of Non-rigid and Articulated Objects, app 36-42, TERE Computar Society 1994, by Messrs. Essa, Darreil, and Pentland. The significant differences in driver images resulting from different drivers, different lighting conditions, and different appearances, makes matching with templates difficult. Furthermore, in performing the matching, significant processing is required to compare an image with each of the templates. Thus, more powerful and faster processors are required, which increases the expense.
In another technique, optical flow is used to determine positions of features and assumes small head movements. An example of the optical flow technique is shown in xe2x80x9cAnalysis and Synthesis of Facial Image Sequences Using Physical and Anatomical Modelsxe2x80x9d, IEEE Pat. Anal. Mach. Intell., 15(6): 569-579, June 1993 by Messrs. Terzopoulus and Waters. In the optical flow technique, the sequence of images are used to follow features and determine a change in position from one image to the next. This requires fast processing so that the head movements between images are small. It also requires significant, processing power in order to meet these processing speed requirements.
Thus, a need exists for a rapid and low cost image analysis system for determining head position and pose of the driver. A need exists for a system which accomodates different subjects, appearances, and lighting. A need exists for a system which does not require high speed processing.
The present invention uses a non-linear mapping technique to map a sample image to a set of output model parameters. The model parameters principally relate to pose or head position. A training set of images and corresponding model parameters is used to learn the mapping from the inputs to the outputs. Non-parametric estimation techniques, such as nearest neighbor estimation techniques, can be used for comparing the sample image to the training set images in order to determine the output parameter.
According to another aspect of the invention, a tree-structure vector quantization technique is used to organize the images in the training set in order to reduce processing time and indexing costs. Each of the images, or data points, in the training set is a leaf of the tree. When an input image is received, the tree is traversed to determine a closest data point in the training set. The output parameter of the closest data point in the training set, i.e., a corresponding pose, is outputted.
According to another aspect of the invention, in building the tree structure, k-means clustering (with K=2) is used to separate the data points iteratively into each side of the tree nodes. Alternatively, according to another aspect of the invention, principal components analysis (PCA) is used to find a direction of maximal variation for the data points assuming an n-dimensional space. The data points are then separated into halves in the direction of maximal variation.
According to another aspect of the invention, a cropping window is used to select a portion of an image in order to limit the image to the face. The training set includes images with faces offset or at different distances in order to determine adjustments to the cropping window.