The problem to be solved is finding the point or the object or more specific the part of an object's surface that a (possibly moving) person gazes at. There are existing solutions to this problem described below which can be split into separate parts.
At first the gaze direction of the person (or a representation thereof like a pupil/CR combination, cornea center and pupil/limbus etc.) is to be found.
This gaze direction is mapped to an image of the scene captured by a head-mounted scene camera or a scene camera at any fixed location. The head-mounted scene camera is fixed with respect to the eye, and therefore such a mapping can be performed, once a corresponding calibration has been executed.
The next step is to map the gaze point in the scene image as captured by the head-mounted camera, which can change due to the movement of the subject, to a point in a (stable) reference image which does not move and which corresponds to a “real world” object.
For determining the gaze direction eye trackers can be used. Eye Trackers observe features of the eye like the pupil, the limbus, blood vessels on the sclera or reflections of light sources (corneal reflections) in order to calculate the direction of the gaze.
Any kind of eye tracker can be used if it allows mapping the gaze direction into images of a head-mounted scene camera.
If the head of the subject does not move, once the calibration is done the determination of the gaze direction directly gives the gaze point on the reference image. The calibration in this special case when the head does not move gives the mapping of a gaze direction from a point in the scene image to a point in the reference image, because the scene image and the reference image are identical as the head-mounted camera does not move but has a fixed location with respect to the reference image.
If, however, the head and the eye move, it becomes more complicated to determine the gaze point in a reference image which does not move based on the detection of the gaze direction with respect to a certain scene image as taken by the head-mounted camera after the head has moved, as the scene image then is not anymore identical with the reference image which was used for calibrating the gaze direction with respect to the corresponding gaze point in the reference image.
One possible approach of determining the point gazed at is to intersect the gaze direction with a virtual scene plane defined relative to the eye tracker. WO 2010/083853 A1 discloses to use active IR markers for that purpose, which are fixed at certain locations, e.g. attached to a bookshelf. The locations of these markers are first detected with respect to a “test scene” which acts as a “reference” image obtained by the head-mounted camera, by use of two orthogonal IR line detectors which detect the two orthogonal angles by detecting the maximum intensity of the two line sensors. The detected angles of an IR source correspond to its location in the reference image. Then the angles of the markers are detected for a later detected scene taken by the head-mounted camera from a different position, thereby detecting the location of the IR sources in the later scene image. Then there is determined the “perspective projection”, which is the mapping that transforms the locations of the IR sources as detected in an image taken later (a scene image), when the head-mounted camera is at a different location, to the locations of the IR light sources in the test image (or reference image). With this transformation a gaze point as determined later for the scene image can also be transformed into the corresponding (actual) gaze point in the test image.
The mapping of the gaze point from the actual “scene image” to a stable reference image which is time invariant becomes possible by defining the plane on which the gaze point is mapped in relation to scene stable markers instead of to the eye tracker (ET). This way the plane of the reference image becomes stable over time and gazes of other participants can also be mapped onto it so that the gaze point information can be aggregated over time as well as over participants like it could only be done before with eye trackers located at a fixed position.
For that purpose the prior art as disclosed in WO 2010/083853 A1 uses IR sources as artificial markers the locations of which can be detected by orthogonal IR line detectors to detect the angles of maximum emission.
The usage of using IR sources as markers for determining the transform of the gaze point from a scene image to a reference image is complicated and inconvenient. It requires artificial IR light sources to be mounted, and it makes it necessary to have an additional IR detector comprising two orthogonal line sensors. It is therefore desirable to provide an approach which can determine a gaze point mapping even if the head-mounted scene camera moves without such external markers.