Embodiments of the present invention relates to tracking a user's gaze when the user is observing a 3D scene, and in particular how to determine if a user's gaze is directed in the direction of a zone of interest in the 3D scene.
It is known to detect an eye and its gaze direction, this can be done, for example by: illuminating a region in which the eye is sought with infrared radiation; capturing an image of the region; and detecting bright spots in the image that derive from the pupil and cornea of the eye. This approach exploits the bright-eye or “red-eye” effect known to photographers whereby light enters the eye and is reflected or absorbed and re-emitted through the pupil, making the pupil appear brighter than the rest of the eye and the face. A separate, smaller bright spot (also referred to as a glint) is created by the cornea. The relative positions of the pupil and the corneal glint can be used to determine the direction of the gaze of the eye. More details are given in U.S. Pat. No. 6,152,563, the entire contents of which are hereby incorporated by reference, for all purposes, as if fully set forth herein.
Alternatively, or complimentary, a similar technique may be used whereby infrared illuminators are spaced from an image sensor, thus an image captured by the image sensor has a non-bright pupil, otherwise known as a “dark pupil” effect.
This gaze tracking technology may be implemented in a remote gaze tracker located adjacent a display for example, or in a wearable device such as a pair of glasses, virtual reality headset, augmented reality headset, helmet or the like.
Such gaze tracking technology can be used to determine if a user is looking at a particular object or area on a screen (these objects or areas are generically referred to as ‘zones’ in the present document). This could be as part of a game, for example. This allows users to interact with images on a screen by looking at them (the act of looking at the image having a predetermined result) or by a combination of looking at an image and another control (e.g., a user pressing a key on a keyboard or mouse whilst their gaze is directed at the image).
Typically, an image on a screen may contain gaze-interactable zones as well as zones which are not gaze-interactable.
Previous methods for determining whether a user's gaze is directed to a gaze-interactable zone in an image tend to be based upon the need for a developer to specify an ‘interaction mask’ to indicate the location on the screen of the interactable elements. These can work well within certain constraints. Those constraints include use of a static “camera” (i.e., the view point from which the image on the screen is determined), and maintaining a small number of moveable objects that are “occluders” (elements that need to be marked as gaze-interactable, but only for the purpose of transmitting visual culling information, not to be “interactable” per se themselves). Also, such systems typically rely on the gaze-interactable objects being visible in the visual scene.
Further, previously it has been possible to poll or otherwise project a line from a virtual camera to determine objects within a scene with which it intersects. However due to an inherent lack of 100% accuracy in gaze tracking technology, it is preferable to poll an area of a scene so as to account of an error or offset in a user's determined gaze location. In effect this requires searching within a cone shape projected from the virtual camera. This is a processing intensive and inefficient solution.
However, in 3D, where the camera can rotate through a scene, such as in many computer games or in virtual reality headsets, and there is typically an abundance of geometry defining a scene which can act as occluders, previous methods are not so successful. Even without occluders, the 3D situation is problematic. Creating the necessary masks to cope with the 3D environment and the varying locations of objects (be they interactors or occluders) from different view points becomes very complex.
This is because, for example, the number of actors (game entities that can interact with the player in an active fashion) in a 3D game is typically much higher than in 2D-applications. This, in effect, means that every object in the scene needs to be considered as a potential occluder. In contrast, the objects actually intended to be gaze-interactable, such as parts of other characters (a character is an actor that can be possessed by either an AI or a player) might comprise as little as 5-10% of each scene. Consequently, ten times or more bandwidth is required for occluders than for gaze-interactable objects. This is inefficient for the implementation and cumbersome to implement for game developers.
Further, some entities in a game scene, such as world geometry (houses, mountains etc.) do not expose renderbounds or physics bounds natively, which means there is a need to project these meshes to the screen to create interactor occluders for them. This can be extremely computationally expensive. Further some of this world geometry is extremely unwieldy (mountains etc.), so to project them in a meaningful way (to get proper occluders) it would become necessary to first employ mesh splitting algorithms before performing any projection. This becomes impractical.
Finally, it can be desirable in certain scenarios to know if a user's gaze is directed to an object or region which is not visually rendered in the image shown on the monitor or screen. For example, the object or region may be invisible, or may no longer be within the bounds of the image displayed on the screen.
Therefore, there is a problem of how to efficiently implement gaze tracking in 3D scenarios. The present invention aims to at least partially overcome this problem.