1. Field of Invention
The present invention is directed to a method and apparatus for detection of drowsiness by an operator of motorized vehicles.
2. Description of Related Arts
Extraction and processing of video images have been investigated for several decades. However, only recently have systems been available with sufficient power and miniaturization to allow for digital video acquisition and processing in real time. These systems can be configured to operate with modern laptop computers or equivalent embedded processing systems to allow real time extraction of video images. The same instrument can be employed to monitor a variety of biological processes, with extraction of data in macroscopic and computerized microscopy environments to drive alarms and to produce control signals for biological production systems.
The importance of drowsiness detection has become increasingly evident with progress in sleep physiology. It is known that many vehicle operators, such as long-distance truck drivers, are sleep-deprived either because of occupational demands (long hours or non-daytime shift work) or because of sleep apnea, behavioral sleep disorders, and fragmented sleep do to physical conditions such as prostatism. Such operators are far more likely to develop drowsiness, particularly on long, monotonous runs such as freeways. The performance of drowsy drivers is much worse than that of alert drivers, and in some cases the impairments rival those of drivers who are intoxicated. Unlike intoxication, however, where the driver can reasonably anticipate that his performance would be impaired, drowsiness-prone drivers often begin their travel in an alert state and are unaware that drowsiness may encroach insidiously upon their performance. They may thus not foresee that their behavior may become dangerous, or even fatal, on the road. Similar considerations apply to truckers, train engineers, and pilots.
Drowsiness occurs in several stages. These stages have both electrophysiologic and physical correlates. One of the conventional indicators of state of alertness is the use of an electroencephalogram (EEG). Individuals who become drowsy and drift off into sleep tend to show certain characteristic EEG features. The normal alpha (8 Hz) activity is suppressed by sensory stimuli and activities in alert individuals. As drowsiness begins, the alpha amplitude increases, and the waveform becomes more regular. Then slower, more irregular rhythms take over, followed by characteristic light sleep patterns (vertex sharp waives, spindles, theta and some delta activity). By the time these latter features appear, the driver no longer is able to control his vehicle. There are serious problems in attempting to use EEG in a situation of active transportation. More specifically, attaching reliable EEG acquisition electrodes to the scalp requires skill, and certainly patience and time, beyond that possessed by the average driver, and maintaining the electrodes in position with normal head movements during driving is not generally practicable. Moreover, problems with electrical noise, generated by driver movement and the automobile environment, will generally swamp the EEG signals, whose amplitudes are in the range 5-20 microvolts, that are necessary to monitor early signs of drowsiness. Other electrical methods, such as monitoring eye movements, suffer from the same problems and are even less reliable from the physiologic viewpoint.
A variety of sensors of physiologic functions, including respiration, pulse, blood pressure, and driver movements, have also been proposed, but their reliability is relatively low, especially compared to EEG, and, again, attaching the sensors reliably is beyond the competence and interest of the average driver. At all events, a drowsiness detector must be noninvasive and independent of driver set-up behavior.
U.S. Pat. No. 6,243,015 discloses a system of continuous digital imaging wherein the vertical “eye-width” is followed continuously and a “drowsiness factor” is purportedly plotted based on the this vertical width. FIG. 3 shows the configuration of the eye needed to ascertain the vertical width, and video threshold filtering is described to ascertain that the eye is in fact being measured. However, this technique, as described, requires precise positioning of the image to obtain the 10 points of interest within the palpebral fissure, and the discrimination is dependent upon threshold determination of appropriate pixel intensity. Under real driving conditions, maintenance of this precision of eye focus is not practicable. Furthermore, variations in light intensity, eyeglass reflections, and normal driver facial mobility make determination of the necessary width values (from lateral to medial) within the fissure likely to be unreliable.
U.S. Pat. No. 6,130,617 discloses a process for digitizing video with the purpose of extracting an image of the driver's eyes. The method appears to be based on recognition of candidate pairs of points after video “binarization” of the facial data. Video threshold processing is used to assign a value to pixels having black levels “similar to that of eyes” and 0-values to all other pixels. A major problem here is the evident procedure of extracting very small regions from the full facial frame whose purported uniqueness is dependent on distance patterns that may well be produced in individual video frames at random, especially because of video noise and vibrational movements of the driver's head due to vehicle motion, and are certain to be distorted by normal driver head motion. Extensive interframe correlation appears to be necessary for validation of such points. But interframe correlation is difficult because of frequent driver head movements whose amplitude is greatly in excess of the dimensions of the points of interest and their separating distances. An additional difficulty is that, even if detection of eye position were possible as described, this would be an incomplete drowsiness detection system because of failure to integrate the behavior of other facial areas necessary for evaluation of the drowsy state, including eyebrows, mouth, and general head movement patterns.
U.S. Pat. No. 5,859,921 discloses variable video filtering used to convert given signal elements to the maximum value in surrounding regions, or to convert a signal element to the minimum value of a surrounding area, which appears to be a type of lateral inhibition filter. The diagram of their FIG. 3 indicates a filtering mechanism which, according to FIG. 4, appears to produce a binary, “all-or-none” signal level for the points of interest. This filtering procedure is claimed to compensate for varying lighting conditions, still allowing extraction of the desired feature. The filter has a predetermined length of processing (for example an eye dimension) so as to exclude larger areas like hair from the processing environment. A control signal is generated to set the filter in a maximum or minimum extraction mode. This filter output signal is used to derive X-axis and Y-axis histograms of points within relevant regions, with correlation of x and y histogram values allowing localization of the relevant structures, including eyebrows, pupils, and nostrils, for example. A major problem with this method is that a large number of time-intensive interdependent calculations must be made, with several internal feedback loops, just to compensate for light variations and other random events, for each video frame, in order to arrive at a histogram curve which is inherently ambiguous because of the multiple points in the curve that in turn need to be analyzed. The alternative to such analysis is to take an average of the histogram curve, again ambiguous, or to depend on a single point at the maximum of the histogram curve. Moreover, all of this is based on achieving an extremely high signal/noise ratio in the original signal, and compensating for variable angles of the face. The latter is accomplished by computing the axes of the face from centroid calculations, but real driving situation can distort the centroid calculation. Thus, each of these calculations is highly interdependent on a series of previous calculations, any one of which is subject to multiple sources of error. Further, since the method depends upon the final discrimination location of only a few points of relatively limited dimension, it appears likely that the noise generated in the video signal by local lighting conditions and vibrational face movements induced by car motion, irrespective of driver initiated facial movements, would be likely to confound a significant fraction of the intraframe analyses described.
U.S. Pat. No. 5,859,686 discloses a method involving formulating a reference matrix of values corresponding to a potential eye location. For each video frame, an X by Y matrix of values is compared to an X by Y block of pixels. The frame is scanned by comparing successively scanned pixel blocks to the matrix values in order to ascertain regions containing intensity blocks corresponding to the subject's pupil and a portion of the iris. As described, this method requires (a) that sufficient resolution and discrimination is possible in a real driving setting to allow a stable reference pupil-iris discriminating matrix be produced, (b) that the driver's head is maintained with sufficient stability that the matrix comparison to real-time values can be performed in the described sequential fashion over the frame, (c) that frame-to-frame correlation (stability) is adequate to provide a stable comparison matrix, (d) that a blink pattern be discriminated as a validation and confirmation of the matrix correlation, and (e) that, even given the foregoing, the eye movements detected would be sufficient to discriminate drowsiness. No actual indication is given of real driving data that correlate the values obtained by the system described, or even that reliable pupil-iris data can be obtained by this method.
U.S. Pat. No. 5,805,720 discusses video threshold processing that locates coordinates of a centroid in a region of pixels consistent with eye intensity. However, the exact criteria for distinguishing the eye specific pixel thresholds are not disclosed in such a manner that a computation method can be determined. A similar problem exists for discerning the method by which the evaluation functions and shape functions are calculated. Moreover, this method again addresses only eye dimensions and does not provide a means to compensate for general driver head motion and does not measure other facial features, to be described, as are incorporated in the present invention.