1. Field
The present disclosure is related to a method for gaze tracking, apt to be built in different devices, such as smartphones, tablets, personal computers, television screens, or in any environment where the gaze can be used to control the operation of any kind of apparatus, such as vehicles and so on. Generally speaking, the present method for gaze tracking aims to be applied to interactive interfaces and operating systems.
The present disclosure also concerns a method for operating a device, provided with at least a digital camera producing a video stream, to obtain a gaze tracking when a face is captured in said video streaming, through both the camera and the processor of the device.
2. Description of the Prior Art
Current studies and products using the analysis of gaze patterns are mostly implemented in controlled laboratory type situations.
For example, many studies are done to determine the effectiveness of website layouts: such controlled tests with subjects and the known issues thereof make these subjects conscious of being tested, thus changing their behaviour and influencing the type of results that are desired from the experiment.
Current gaze tracking solutions predominantly work using the projection of infrared light, which creates reflections within and on the eye, which can be tracked by algorithms such as blob detection. The number of glints can be increased with extra sources of infrared, to improve the tracking and to allow some tolerance for head movements.
Gaze tracking using infrared typically requires a remote setup where the camera is placed further away from the user, usually below the screen. It requires that the light sources for the IR illumination be placed in positions where the glints are clearly visible when looking at the four corners of the screen.
Solutions using infrared for gaze tracking require a number of infrared projections so as to have a reasonable box of movement of the head relative to the camera. Even if creating a larger box, any changes in lighting conditions will mean that re-calibration is required.
Solutions, which do not use infrared reflection, are mostly based on head-tracking, using the recognition and following of features of the face, with methods such as the so-called Active Appearance models. However, the use of head orientation tracking for directional input is not the same thing as the gaze tracking, which is the following of eye direction only, regardless of the head's movement.
Further known methods use the classification of eye gaze maximum position, recognizing the difference between eyes in up/down/left/right orientation; such solution can only be used for identifying upside-down or left-right scrolling directions, something truly different than an accurate gaze tracking.
Methods not using infrared often seek to use the stereo vision to increase accuracy, which in any case remains limited, but making the hardware more complex.
Other non-infrared methods for the gaze tracking are substantially based on the recognition of face features such as eyebrows, chin, pupil, corners of eyes and so on. They necessarily have a lower accuracy, due to the difficulty in recognizing the corners of eyes, and a lower robustness to light changes and to different types of faces. They also require that the full face is visible. Also the accuracy for upside/down movements is lower with such methods, since the relative vertical movement of the pupil is small while the eyelid position will also adapt itself to the eye movement.
Further, there are a number of barriers preventing the integration of infrared hardware in mobile devices. Integrating gaze tracking using infrared means higher costs and extra battery drain. Plus, high research and development costs are generally required to create the miniaturized hardware, with current state-of-the-art hardware still being too large to be integrated into mobile devices, especially because reasonably powerful infra-red light more than one source of infra-red light are required.
The same is true in the case of using an extra video camera for stereo vision, as it adds hardware costs and extra battery drain to the mobile device, making a software solution much more desirable.
Although there is no definitive study yet to conclude if continued exposure from a short distance to infrared light can result in eye damage, customers might have concerns, considering also the fact that young children become mobile device users sooner, when the damage is usually considered proportional to the exposure time to the IR light, hours per day for some user.
Methods such as stereo vision are used to improve this accuracy, but any expert in the field of gaze tracking will realize that, even with a perfect recognition of pupil positions and eye corners, the accuracy and resolution of the gaze direction resulting from calculations depending on pixel positions will always be too limited to be of practical use, and in the spatial domain the methods will be inherently slow. A method of this kind will have trouble in recognizing the difference between a pupil movement on the screen due to gaze direction change or a movement of the head. Also recognizing up-down movement of the eye will be troublesome with such methods as the eyelid has a great effect on the visual image of the eye than the pupil.
The potential accuracy of techniques which use the projection of infrared eyes is also limited by uncertainty factors regarding the curvature on the inside of the eye and the outside of the eye. For this reason, methods using infrared projection often require several infrared projectors and a careful calibration procedure. It also requires for the light conditions to remain stable after calibration and for the user to remain in a relatively small movement box in front of the screen. This makes the implementation of infrared in mobile devices for gaze tracking which use the recognition of infrared glint in the eye unpractical for full mobility real world uses on mobile devices.
A software-only solution for the gaze tracking is also required in consideration of another remarkable drawback involved in using infra-red projection for gaze tracking: i.e. the camera infrared filter has to be removed from the camera lens so as to allow the capture of the infra-red reflection on the eyes. Removing the infrared filter will deteriorate the quality of photos taken by the devices. Considering the importance placed on the quality of photos taken by users of mobile devices, this is also a highly limiting factor for the adaptation of infrared projection for gaze tracking.
In any case, a man skilled in the art of gaze tracking recognizes that, with a face at a distance of about 30 centimeters from a mobile screen, the pupil will only be moving over the screen with a gaze movement from side to side of a small screen of a mobile device, corresponding to a small number of pixels of the image captured by a camera placed beside the screen itself. Further, attempting to use methods based on image processing in the spatial domain requires not only the pupil to be recognized but that the corners of the eyes must be clearly identified.
However, the corners of the eyes are difficult to recognize with recognition common methods, such as Viola-Jones, quickly resulting in several pixels of error.
Object recognition methods mostly use the analysis of pixel level information in the spatial domain, which are mostly converted to grey-scale. Such methods, such as extracting features with the Viola Jones algorithm, require the use of cascade classifiers such as Adaboost. Other methods extracting geometrical features such as Active Shape Models rely on the correlation between classified feature points and a 3D shape model. These methods inherently require relatively heavy calculations and a lot of work to optimize.
Other methods that are used commonly are for example Hidden Markov Models or back propagation Neural Networks, both being complex.
All such methods also are generally difficult to engineer and optimize and quite much work to be adapted to follow and take advantage of the latest hardware developments such as multi-core processing or advances in GPU technology.
So, pupil position recognition in the spatial domain with errors of a few pixels must be compared with eye corner recognition, which will quickly have several pixels of error, to capture a pupil movement relative to the eye corners which in total is only several pixels.
This does not even consider the effects of head orientation, head movement and such on the accuracy.
Therefore, it will be clear that these calculations in the spatial domain result in it being practically impossible to calculate the gaze direction on a mobile device from the difference between pupil position and eye corner positions.
Hence, the only realistic option to obtain the required gaze accuracy and resolution on a mobile device in a fully software solution which is with the use of information obtained from within the frequency domain calculations.