Line-of-sight tracking/eye-movement tracking holds great significance for the understanding of user behavior and efficient human-computer interaction. More than 80% of information perceptible to human is received by the human eye, of which more than 90% is processed by the visual system. Therefore, the line-of-sight sheds great light on reflecting the interaction process between human and the outside world. In recent years, value of the line-of-sight tracking technology in application gradually stand out, thanks to the rapid development of virtual reality technology and human-computer interaction technology; on the other hand, calculation of the line-of-sight direction remains a great challenge in the field of computer vision. Up until now, the solution has mostly been based on active light source and infrared camera, which requires additional hardware, and demanding conditions of the application environment. In one alternative method, a single camera is employed for shooting a human eye image prior to calculation of the line-of-sight direction, eliminating the need for assuming the active illumination, however, a large number of training samples are required to be obtained in advance, for the sake of conducting learning and deriving a regression calculation model. For example, an early-stage neural network system, proposed by Baluja and Pomerleau, requires the use of thousands of training samples for training. Tan et al., proposed a method based on local linear interpolation, in which the eye image and coordinates of the line-of-sight undergo mapping, a system which needs about 200 training samples.
In order to reduce the demand on the number of training samples, Williams et al. proposed a semi-supervised method capable of simultaneously utilizing both labeled samples and unlabeled ones for training. Lu et al. proposed a self-adaptive regression framework based on sparse optimization, which allows the use of fewer training samples for calculation, and is able to address a series of related issues in the calculation of line-of-sight at the same time. Sugano et al. adopted a method to automatically generate training samples, before applying them to system training, via extracting visual saliency from videos. The above methods are disadvantageous in that, the position of the head is assumed to be fixed; and more training samples are in need to solve the problem of head movement, if the methods are to work under the condition in which the position of the head changes.
In order to completely avoid system training, Yamazoe et al. and Heyman et al. came up with a method to realize the calculation of line-of-sight via calculating the position of the iris center relative to the eyeball center, considering that the line-of-sight direction is determined solely by eyeball orientation, which is obtainable by calculation of the orientation of the iris disc or the central position thereof. Their method requires 3D modeling of the head, and precise tracking of 3D feature points of the face, including the position of an eye corner and the central position of the eyeball. In practice, these feature points are usually difficult to precisely extract, sometimes even invisible. Ishikawa et al. utilized a method to track facial feature points based on active appearance model (AAM), which also encountered the same problem. In some other methods, an ellipse is employed to fit the iris contour, before the ellipse is subject to reverse projection to form a circle in 3D space. The method derives from the fact that the iris contour can be regarded as an approximate circle, the projection of which in a two-dimensional image is elliptical, and it is possible for the orientation of the iris in a 3D world to be worked out via analysis of the elliptical shape. This method is common as one based on the shape of the iris contour. However, the traditional iris contour analysis method may not be reliable in practical application. What accounts for this is that, in an image, the iris region is small in area while large in noise, rendering a precise extraction of the contour very difficult, adding to the fact that a nuanced error in a few pixels extracted from the contour is all that is required to cause enormous deviation in the calculation of line-of-sight. Therefore, in many cases, the only choice is to shoot a human eye image with ultra-high resolution, or to use a wearable camera to improve the precision, which raises the requirements for hardware and imposes further restrictions on practical application scenarios. Given the aforementioned methods, the present invention provides a calculation method of line-of-sight direction based on analysis and match of iris contour in human eye image, in which virtual generation of the appearance of the iris is combined, mainly for overcoming the problem of poor stability and low precision of the traditional iris contour matching method, concerning a normal-resolution human eye image that has been shot, so as to realize high precision calculation of 3D line-of-sight.