Driver fatigue and lack of sleep of drivers especially those that drive large vehicles such as trucks, buses, etc. has been a growing problem in recent years. According to the United States National Highway Traffic Safety Administration, approximately 240,000 motor vehicle accidents occur per year in the U.S. alone due to driver fatigue, and lack of sleep. Sleep related accidents cost the American government and businesses an estimated 46 billion dollars a year. Automatically detecting alertness of drivers early enough to warn them about their lack of alertness due to fatigue can save the U.S. tax payers and businesses a significant amount of money and personal suffering.
Work on driver alertness has not yet led to a system that works in a moving vehicle. Also, none of the known attempted proposals appear to adequately deal with additional complications like mouth opening and closing, full occlusion, or blinking of a driver. For example, Yamamoto et al, Journal of SAE Japan, 46(9), 1969, did not present any methods to acquire the driver's state. Further their method relies on light emitting diodes(LEDs), and uses multiple cameras to estimate facial orientation. A moving vehicle presents new challenges like variable lighting and changing backgrounds that is not easily solvable. Most of the earlier papers on driver alertness have used intrusive techniques to acquire driver vigilance information.
In a more recent publication, Ji et al, Procs. Honda Symposium, pp. 48-55, 1999, multiple cameras are used with one viewing an entire face, and one camera with a view of the eyes only. Their idea is to move the eye camera on the fly to get the best image of the eyes, since their technique uses eye information like blink frequency. They use LEDs to minimize problems with lighting conditions. To get a more accurate estimation they propose to analytically estimate the local gaze direction based on pupil location, and mention the use of Bayesian networks to acquire information on driver vigilance.
Several techniques have been propose for improving the monitoring and vigilance of drivers and particularly to truck drivers of the large rigs to prevent their falling asleep while at the wheel which generally results in catastrophic highway wrecks. A number of these approaches will now be described.
SAE Technical Paper Series 942321 describes a known system of analyzing a “driver's facial expression, frequency of their secondary movement . . . (yawning etc . . . ) for alertness as video images alertness levels.” This technique measures external factors like space in front of car, steering wheel, lateral position of car, speed of vehicle, but has no mention of detecting driver alertness with computer vision.
SAE Technical Paper Series 942326 describes closed circuit Televisions” (CCTV) and video camera to monitor driver behavior and video instrumentation to monitor a driver's face. This technique describes the use of braking and shifting information including steering patterns, brain wave, revolutions per minute(rpm). Video images are used to manually get ground truth to decide upon the driver's vigilance level. There was no use of camera data for computer vision purposes.
In addition to the publications referred above, the inventors are aware of several United States Patents that propose related techniques which will now be described.
U.S. Pat. No. 5,798,696 to Metalis describes sensors that can detect “Headrolls” to determine driver impairment. However, these “sensors” are intrusive and require the subject to wear eyeglasses. The system also uses accelerometers and measures the driver's performance by means of lateral vehicle movements.
U.S. Pat. No. 5,795,116 to Wilson-Jones et al. describes a system using video cameras on vehicles to detect lane markings and vehicle related thereto and does not use computer vision.
U.S. Pat. No. 5,691,693 to Kithil describes a system for detecting head position and head motion with sensors, abstract, for determining driver “impairment.” However, this technique does not use computer vision techniques or cameras, and instead uses capacitive coupling and true sensors to locate head with no disclosure of how it compares or measures head motions to predefined head motions.
U.S. Pat. No. 5,689,241 to Clarke, Sr. et al. describes a system using a “digital “camera” to focus on eye and nose facial features and detects head and eye movement as a driver alertness system. This technique uses infrared technology to detect facial features with thermal sensors as the main criteria for determining driver alertness. These sensors measure temperature of facial regions like the nose and mouth. However, this technique does not show how to locate the face initially or mention rotation as a factor in determining driver alertness. This techniques method would not be able to deal with rotation of the head which can occur with driver fatigue and driver loss of sleep. This technique detects eye blinking by using temperature differences which is unrelated to computer vision.
With the advent of the electronic age and the increase in catastrophic wrecks of big rigs on the highway system, driver alerting systems have employed some computer vision techniques which will not be described.
U.S. Pat. No. 5,008,946 to Ando describes a system for recognizing images using a television type camera to analyze various facial features such as eyes, mouth, and facial detection to control electrical devices in a vehicle. This patent's algorithms are simple but are ineffective. This technique uses electrical devices to look for certain motions which are not able to determine driver alertness since it cannot recognize unrestricted movements; uses no kind of hierarchical tracking, does not address full facial occlusion, and, requires the use of mirrors to shine light on the driver's face.
U.S. Pat. No. 5,795,306 to Shimotani et al. describes a system using CCD cameras to detect features of a driver's face such as pupil position (blinking, etc.) to determine drowsiness levels. Since this technique performs a tilt analysis over two or more minutes, it does not perform any real time driver alertness detection. It also uses infrared technology, lights to shine on driver's face and a mirror system to shin light on driver's face.
U.S. Pat. No. 6,130,617 to Yeo describes a system of using a CCD camera to detect eyes and nostril area to determine if a driver is drowsy. This technique uses binary images for detection. However, this technique could break down with varying lighting conditions.
U.S. Pat. No. 5,786,765 to Kumakura et al. describes a driver alertness system using a camera to detect eye blinkage levels to determine driver alertness. Their system only uses eye data, does not take into account head rotation or occlusion. Furthermore, their system does not say how they compute blinks. They use eye blink frequency, but nowhere do they describe how to the detect eyes. Also, the driver vigilance system waits a whole minute before making a determination of driver alertness, which would be too long to be used as a real time warning or alarm system.
U.S. Pat. No. 5,786,765 to Galiana et al. describes an alertness monitor that checks both head motion by sensors and eyelid movement by digital type cameras, and activates alarms when threshold levels are reached and mentions several other unsubstantiated claims. This technique would not work during rotation or other prolonged occlusion of a driver's head.
U.S. Pat. No. 6,070,098 to Moore-Ede et al. describes a system of using video data to detect head movement and eye tracking data to detect eye blinking, open and closed position, to check if the data exceeds thresholds levels for a driver alertness system. It uses neural networks to compare abnormal movements like blank stares, yawning, and mentions classifying motions automatically, by a “neuro-fuzzy” system. It is said that the hybrid network generates and learns new categories of eye/head movement without any discussion of how their method works without presenting results in a convincing way.
U.S. Pat. No. 5,835,616 to Lobo et al. (one of the inventors of the subject invention) describes a digital video camera system for detecting facial features such as eyes, lips, and sides of face, and uses methods that rely on gray scale data. The system does not analyze video sequences and has a very controlled environment which would have difficulty being adapted to drivers.
In addition to the above publications and patents, the inventors are aware of recent techniques that are both complex and inadequate to adequately track facial images and features to monitor alertness of drivers suffering from fatigue and lack of sleep.
For example, it is known to use a method to detect the face and eyes of a person's head that uses multiscale filters like an elongate second derivative Gaussian filter to get the pre-attentive features of objects. These features can be supplied to different models to further analyze the image. The first is a structural model that partitions the features into facial candidates, and they incorporate an eyebrow model to avoid misclassifications. After they obtain a geometric structure that fits their constraints they can use affine transformations to fit the real world face. Next their system uses a texture model that measures color similarity of a candidate with the face model, which includes variation between facial regions, symmetry of the face, and color similarity between regions of the face. The texture comparison relies on the cheek regions. Finally they use a feature model to obtain the location of the eyes. Their method uses eigen-eyes and image feature analysis. In addition they use the fact that the directions of the pre-attentive features of the eyes must be in roughly the same direction. Then they zoom in on the eye region and perform more detailed analysis. Their analysis includes hough transforms to find circles and reciprocal operations using contour correlation.
Another approach is a system using 3D(three dimensional) vision techniques to estimate and track the 3D line of sight of a person using multiple cameras. Their approach also uses multiple point light sources to estimate the line of sight without using user-dependent parameters, thus avoiding cumbersome calibration processes. The method uses a simplified eye model, and it first uses the Purkinje images of point light sources to determine eye location. Then they use linear constraints to determine the line of sight, based on their estimation of the cornea center.
Finally, another method uses the Support Vector Machines (SVMs) to solve pattern recognition problems. SVMs are relatively old, but applications involving real pattern recognition problems is recent. First they do skin color-based segmentation based on single Gaussian chrominance models and a Gaussian mixture density model. Feature extraction is performed using Orthogonal Fourier-Mellin Moments. Then they show how, for all chrominance spaces, the SVMs applied to the Mellin Moments is better than a 3-layer perceptron Neural Network.
These other driver alertness techniques set forth above generally rely on non-camera methods which do not provide actuation of the alerting signal in sufficient time to avert an accident. These camera systems use: video sequences which are vastly different in the techniques used for single camera images; artificial or infrared lighting or using systems of mirrors to reflect light on the driver's face to determine vigilance; and also operate only under controlled situations (not in a fully unrestricted daytime environment); and, do not disclose the use of use of a single camera with neither artificial nor infrared lighting nor using systems of mirrors to reflect light on the driver's face to determine driver vigilance; and, no algorithim system which actually reconstructs the driver's gaze by focusing on the driver's face.