Gaze estimation refers to the process of determining where a person is looking at in a predefined plane, i.e., a gaze plane. Such a gaze plane may be associated with a scene, such as, e.g., a physical scene as presented in a shopping window, or a virtual scene as provided by content displayed on a computer screen. Here, the points at which a person gazes at in the scene are referred to as gaze points. Gaze estimation is increasingly important for many applications such as human-computer interaction, marketing and advertisement, and human behavior analysis.
Gaze estimation methods generally involve calibration. The calibration may be camera-based (estimating the camera parameters), geometric calibration (estimating the relations between the scene components like the camera, the gaze plane, and the user), or personal calibration (determining the angle between visual and optical axes). An extensive overview of the different approaches of gaze estimation and their calibration can be found in “In the Eye of the Beholder: A Survey of Models for Eyes and Gaze” by D. W Hansen et al., IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 32, Issue 3, p. 478-500, 2010.
During calibration, users are usually asked to fixate their gaze on certain points in a scene while images of their eyes are captured. This procedure is cumbersome and sometimes impractical. For example, when tracking costumer's attention in shops, estimating the gaze points or regions should be done passively, i.e., without the user having to actively take part in a calibration.
It is known to use statistical data of eye movement measurements to enable such passive gaze estimation. For example, U.S. Pat. No. 7,657,062 B2 provides a calibration apparatus for automatically self-calibrating a set of eye tracking measurements to a reference space, said apparatus comprising: a tracking device capable of capturing a plurality of eye gaze measurements, representing a plurality of eye gaze positions; a statistical analyzer for determining a statistical distribution of said plurality of eye gaze measurements; a data storage device for storing a predetermined set of statistics data of eye movement measurements; a statistical data comparison component for comparing said statistical distribution data of said plurality of eye gaze measurements with said stored predetermined set of statistical data of eye movement measurements; and a calibration data generating component for generating a calibration data depending upon a result of said comparison.
Hence, U.S. Pat. No. 7,657,062 B2 uses statistical distribution data of which the relation to the reference space is known to calibrate the eye gaze measurements of a user. It is mentioned that the predetermined statistics may take two forms. Firstly, a set of statistics may be taken for a plurality of persons in order to obtain an “average” statistical information for human users, describing the eye movement patterns of a notional average person. Such statistics are not specific to any one individual person, but may represent a notional average person based upon an average of a test sample comprising a plurality of persons. FIG. 6 of U.S. Pat. No. 7,657,062 B2 shows that the notional average person gazes straight ahead with the highest probability. Secondly, statistics may be derived for measurements taken from just one individual person. In this case, the individual person may have their own particular quirks and idiosyncrasies.
WO 2013/059940 A1 describes a method for automating a gaze tracking system calibration based on display content and/or user actions that are likely to attract the subject's gaze. The method involves obtaining gaze data, obtaining at least one key point corresponding to a portion of media content being displayed, linking the gaze data to the at least one key point, and generating one or more calibration parameters by comparing gaze data with associated ones of the at least one key point
Disadvantageously, U.S. Pat. No. 7,657,062 B2 and WO 2013/059940 A1 each provide a relatively inaccurate estimation of the gaze of a user viewing a scene.