There are existing solutions to the problem of finding the point or the object or more specific the part of an object's surface that a (possibly moving) person gazes at. Such solutions are described below and can be split into separate parts.
At first the gaze direction of the person (or a representation thereof like a pupil/CR combination, cornea center and pupil/limbus etc.) is to be found.
For determining the gaze direction eye trackers can be used. Eye Trackers observe features of the eye like the pupil, the limbus, blood vessels on the sclera, the eyeball or reflections of light sources (corneal reflections) in order to calculate the direction of the gaze.
This gaze direction is then mapped to an image of the scene captured by a head-mounted scene camera or a scene camera at any fixed location. The head-mounted scene camera is fixed with respect to the head, and therefore such a mapping can be performed, once a corresponding calibration has been executed. For performing the calibration a user may have to gaze at several defined points in the scene image captured by the head-mounted camera. By using the correspondingly detected gaze directions the calibration can be performed resulting in a transformation which maps a gaze direction to a corresponding point in the scene image. In this approach any kind of eye tracker can be used if it allows mapping the gaze direction into images of a head-mounted scene camera.
This approach enables the determination of a gaze point in the scene image as taken by the head-mounted scene camera.
As a next step it can be of interest to map the gaze point in the scene image as captured by the head-mounted scene camera, which can change due to the movement of the subject, to a point in a (stable) reference image which does not move and which corresponds to a “real world” object or an image thereof. The reference image thereby typically is taken from a different camera position than the scene image taken by the head-mounted scene camera, because the scene camera may move together with the head of the user.
For such a case where the head moves, there are known approaches for determining the gaze point in a reference image which does not move based on the detection of the gaze direction with respect to a certain scene image as taken by the head-mounted scene camera even after the head has moved.
One possible approach of determining the point gazed at is to intersect the gaze direction with a virtual scene plane defined relative to the eye tracker. WO 2010/083853 A1 discloses to use active IR markers for that purpose, which are fixed at certain locations, e.g. attached to a bookshelf. The locations of these markers are first detected with respect to a “test scene” which acts as a “reference” image obtained by the head-mounted camera, by use of two orthogonal IR line detectors which detect the two orthogonal angles by detecting the maximum intensity of the two line sensors. The detected angles of an IR source correspond to its location in the reference image. Then the angles of the markers are detected for a later detected scene taken by the head-mounted camera from a different position, thereby detecting the location of the IR sources in the later scene image. Then there is determined the “perspective projection”, which is the mapping that transforms the locations of the IR sources as detected in an image taken later (a scene image), when the head-mounted camera is at a different location, to the locations of the IR light sources in the test image (or reference image). With this transformation a gaze point as determined later for the scene image can also be transformed into the corresponding (actual) gaze point in the test image.
The mapping of the gaze point from the actual “scene image” to a stable reference image which is time invariant becomes possible by defining the plane on which the gaze point is mapped in relation to scene stable markers instead of to the eye tracker (ET). This way the plane of the reference image becomes stable over time and gazes of other participants can also be mapped onto it so that the gaze point information can be aggregated over time as well as over participants like it could only be done before with eye trackers located at a fixed position.
For that purpose the prior art as disclosed in WO 2010/083853 A1 uses IR sources as artificial markers the locations of which can be detected by orthogonal IR line detectors to detect the angles of maximum emission.
The usage of using IR sources as markers for determining the transform of the gaze point from a scene image to a reference image is complicated and inconvenient.
In the European Patent application no. EP11158922.2 titled Method and Apparatus for Gaze Point Mapping and filed by SensoMotoric Instruments Gesellschaft far innovative Sensorik mbH which is incorporated herein by reference there is described a different approach. In this approach there is provided an apparatus for mapping a gaze point of a subject on a scene image to a gaze point in a reference image, wherein said scene image and said reference image have been taken by a camera from a different position, said apparatus comprising:
A module for executing a feature detection algorithm on said reference image to identify a plurality of characteristic features and their locations in said reference image;                a module for executing said feature detection algorithm on said scene image to re-identify said plurality of characteristic features and their locations in said scene image;        a module for determining a point transfer mapping that transforms point positions between said scene image and said reference image based on the locations of said plurality of characteristic features detected in said reference image and said scene image;        a module for using said point transfer mapping to map a gaze point which has been determined in said scene image to its corresponding point in said reference image.        
This enables the implementation of gaze point mapping which does not need any artificial IR sources and IR detectors. It can operate on normal and unamended images of natural scenes taken by normal CCD-cameras operating in the visible frequency range. For a detailed description of this approach reference is made to European Patent application no. EP11158922.2.
But even with this approach it is only possible to map a gaze of a moving subject to a certain predefined static plane, however, the determination of a gaze endpoint at any arbitrary object in 3D space is not possible.
It is therefore an object of the invention to provide an approach which can determine the gaze endpoint at any arbitrary three-dimensional object in 3D-space.