According to the NHTSA, 10% of all fatal crashes in the United States are due to driver distractions with 3179 people killed and an estimated 431,000 people injured in 2014 (according to NHTSA's National Center for Statistics and Analysis as published in the technical report Traffic safety facts: Distracted driving 2014). Thus, monitoring the distraction level of the driver will become a critical success factor of next generation vehicles. Head pose, facial expression and eye-lid movements all may contribute to an overall assessment of the driver's distraction level.
The European New Car Assessment Programme 2020 roadmap Technical report of March 2015 includes a schedule for the promotion of virtual co-pilot concepts and innovations in the field of driver state monitoring. Vehicle manufacturers will be given credit if they provide such safety technologies not just as an add-on feature, but as a standard.
With self-driving cars, the driver must take over control in critical or complex situations. The take-over decision, however, also depends on the state of the driver and thus self-driving cars must rely on driver-status monitoring.
The social obligation to reduce fatal injuries in vehicle crashes has pushed car manufacturers and their suppliers to build sensor systems that not only observe the outside world of a vehicle but also monitor the interior of the vehicle especially the state of the driver of the machinery.
Common systems for driver state monitoring based on visual sensors require the sensors to be mounted in particular locations—for example on the steering wheel as in the US Published application 20100002075 A1—imposing tough constraints on the design process of such systems.
Alternative systems for driver state monitoring are based on very different features and input sources, such as the driver's steering behavior, as disclosed, for example, in U.S. Pat. No. 5,815,070 (Driving state-monitoring apparatus for automotive vehicles); or his ability to respond to an interrogation signal as in U.S. Pat. No. 6,154,123 (Driver alertness monitoring system). The system disclosed in U.S. Pat. No. 6,049,747 (Driver monitoring device) is focused on a particular way of obtaining 3D data by projecting a pattern of bright spots on the drivers face. Further systems, such as in U.S. Pat. No. 7,138,922, assume the existence of a drowsy-driver detector and focus on how to communicate with the drowsy driver by involving a human operator.
Driver state monitoring often relates to face detection. Methods for detecting faces in two-dimensional images are described in a number of scientific publications, of which the most frequently cited one is the standard method developed by Paul Viola and Michael J. Jones (Robust real-time face detection. International Journal of Computer Vision, 57(2):137-154, 2004). Further methods are, for example, disclosed in WO Patent App. PCT/EP2007/006540 by Steinberg et al. and in U.S. patent application Ser. No. 14/563,972 by Corcoran et al.
Most methods for face detection and head tracking rely on facial features or landmarks. The general workflow is to maintain an internal object model including the landmark positions. For every new image, landmarks of the internal model are matched with the current view from the sensor to obtain the relative position between object and sensor. Such methods may fail when landmarks become invisible (e.g. when the user turns away from the sensor) or temporarily occluded (e.g. when the user scratches his or her nose). In some cases, such landmarks cannot be detected at all, e.g. for certain types of glasses, hair and beard styles. Further, variations in illumination, reflections of light from glasses, sunglasses and contact lenses may hinder the detection of valid landmarks.
Generally, landmark-based methods rely on front facing sensors, i.e. the sensor is mounted in a way such that the operator's face points directly towards the sensor in the default position. However, in the most prominent application of monitoring the behavior of a driver of a vehicle, the sensor position will most likely be mounted in non-facing locations such as the A-beam, the rear mirror location, or the center console.
In WO Patent App. PCT/AU2008/000,290, Tell disclosed a typical workflow for a landmark-based method where a three-dimensional object is rendered, salient point features or landmarks are extracted from the three-dimensional object model, corresponding features are localized in an image and the new object orientation is derived from the correspondences between the landmarks of the object model and the view. However, the method focuses on point features defined to be at a predefined number of locations and having highest edginess. Occlusion of some of the predefined locations might hinder the application and resolution of the image sequence is critical for achieving the required performance level.
Head pose estimation is most commonly interpreted as the ability to infer the orientation of the head relative to the view of a camera. Before the development of affordable 3D sensors, early head tracking techniques where limited to using grayscale- or color-image sequences. A good overview of these methods is given in a publication by Erik Murphy-Chutorian and Mohan Manubhai Trivedi (Head pose estimation in computer vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(4):607-626, 2009).
In U.S. patent application Ser. No. 11/420,864, Victor et al. disclose a method for drowsiness detection, which is based on the assumption that drowsiness leads to a relaxation of muscles, which in turn leads to specific head movements that can be identified by head pose detection.
Metrics for measuring the attention level have been disclosed by Langdale-Smith et al. in WO Patent App. PCT/AU2009/001,547; and may include the orientation of faces and eyes, the duration of looking at a particular region of interest, duration of facing a region of interest, facial reaction, and relative changes in facial expression. However, the invention does not disclose a technically feasible way to retrieve and quantify the required features, e.g. the facial expressions.
In WO Patent App. PCT/AU2010/000,142, Langdale-Smith et al. disclose a method for monitoring the attentiveness of an operator of machinery with respect to the motion of the vehicle. They take into account only the three-dimensional position of the operator's head and do not consider facial expressions.
Most methods that operate with faces require a sensor, which is assumed to observe the bare face. In general, however, a driver or machine operator may wear eyeglasses, helmets or other protective equipment that partially occludes facial landmarks. Thus, even methods that enhance facial features before classification by using local image operators such as disclosed by Loy et al. in U.S. patent application Ser. No. 10/951,081, will most likely fail. Additionally, such methods require the visual appearance of the landmarks to be known in advance. The protective equipment of a machine operator will most likely provide good features for visual tracking but the appearance will not be known in advance and may vary largely between operators.
Besides only detecting faces, some methods further process the faces to derive, for example, gaze direction from head or eye positions (U.S. patent application Ser. No. 10/350,835 and U.S. Pat. No. 7,043,056) or facial expression form eyes and lips (U.S. patent application Ser. No. 14/680,977). Some driver monitoring systems focusing exclusively on eye tracking and drowsiness detection have been proposed. In U.S. patent application Ser. No. 14/484,875, Seok et al. disclose a combined gaze tracking and finger detection method to control head up displays in a vehicle.
Other methods such as in U.S. Pat. No. 5,229,754 adapt displays such as head-up displays according to the head pose.
A common alternative to eye-tracking is the monitoring of the head pose of the driver as an approximation to where the driver is looking at. Such methods have been proposed in U.S. Pat. No. 5,691,693, WO Patent App. PCT/US2001/047,612, U.S. patent application Ser. No. 11/317,431, and U.S. patent application Ser. No. 11/796,807, but are not sufficiently accurate.
One of the first methods for reconstructing a rigid object using a low cost consumer depth sensor, called Kinect Fusion, was proposed by Shahram Izadi, et al (Kinectfusion: Real-time 3d reconstruction and interaction using a moving depth camera. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology, pages 559-568, 2011). In U.S. Pat. No. 9,251,590, U.S. patent application Ser. No. 13/017,474, and U.S. patent application Ser. No. 13/017,587; data from a Microsoft Kinect RGBD sensor were used to reconstruct surfaces and estimate the current camera position relative to that surface. First, these methods iteratively track the camera position by aligning the current image with an integrated image (obtained by integrating a series of previous images) by using an ICP-based method (Iterative Closest Point). Then, the volume is integrated and views of the reconstructed surface are estimated by ray casting. Here, deviations from the model are regarded as noise whereas in our method they are treated as information that can be used to distinguish object states.
Some extensions allow to estimate the object surface even when the object is deformed while being scanned (Richard A. Newcombe, Dieter Fox, and Steven M. Seitz. Dynamic fusion: Reconstruction and tracking of non-rigid scenes in real-time; and Mingsong Dou, Jonathan Taylor, Henry Fuchs, Andrew Fitzgibbon, Shahram Izadi. 3D scanning deformable objects with a single rgbd sensor; both published in The IEEE Conference on Computer Vision and Pattern Recognition, 2015). Therefore, a deformation function is continuously updated during the scanning process. However, the goal here is to compensate for the deformations and not to extract additional useful information that is further processed.
This application describes improvements in systems and methods for identifying driver states and driver head pose.