Vision systems frequently entail locating and tracking an object such as a person's eye in successively generated frames of video data. In the motor vehicle environment, for example, a CCD camera can be used to generate a video image of the driver's face, and portions of the image corresponding to the driver's eyes can be analyzed to assess driver gaze or drowsiness. See, for example, the U.S. Pat. Nos. 5,795,306; 5,878,156; 5,926,251; 6,097,295; 6,130,617; 6,243,015; 6,304,187; and 6,571,002, incorporated herein by reference. While eye location and tracking algorithms can work reasonably well in a controlled environment, they tend to perform poorly under real world imaging conditions, particularly in systems having only one camera. For example, the ambient illumination can change dramatically, the subject may be wearing eyeglasses or sunglasses, and the subject's head can be rotated in a way that partially or fully obscures the eye.
Tracking eye movement from one video frame to the next is generally achieved using a correlation technique in which the eye template (i.e., a cluster of pixels corresponding to the subject's eye) of the previous frame is compared to different portions of a search window within the current frame. Correlation values are computed for each comparison, and the peak correlation value is used to identify the eye template in the current frame. While this technique is useful, the accuracy of the eye template tends to degenerate over time due to drift and conditions such as out-of-plane rotation of the subject's head, noise and changes in the eye appearance (due to glasses, for example). At some point, the eye template will be sufficiently degenerated that the system must enter a recovery mode in which the entire image is analyzed to re-locate the subject's eye.