In various imaging systems and image treatment applications, it is advantageous to automatically recognise the position and/or the orientation of a human head in a source image. For instance, a user may interact with a program running in a computer system, for example, a videogame program, by moving his head within the range of an imaging device. Alternatively, such a head recognition method may also be used in an imaging device for adjusting parameters such as aperture, exposure time, focus depth, etc. so as to optimize them for portraiture.
Interaction with computer systems, and, in particular the input of data and commands, is a generally known issue. Conventionally, such interaction takes place through physical input devices such as keyboards, mice, scroll wheels, pens, touch-screens, joysticks, gamepads, etc. which produce signals in response to a physical action of the user. However, such physical input devices have many drawbacks. For instance, they can only offer a limited amount of different input signals, which in some applications such as three-dimensional “virtual reality” environments will feel awkward and lack realism. Moreover, they are susceptible to wear and their continued use may even have negative consequences for the user's health, such as Repetitive Strain Injury (RSI).
Alternative input devices and methods are also known. For instance, practical systems for voice recognition are available. However, voice recognition is not a practical alternative for some applications, such as action games, where rapid, precise and repetitive inputs by the user are required. Moreover, their effectiveness is adversely affected by background noise, and they generally require a learning period to recognise a particular user's voice commands.
Another alternative is image recognition. In their simplest form, image recognition systems recognise binary patterns in contrasting colours, such as barcodes, and convert these patterns into binary signals for processing. More advanced image recognition systems can recognise more complex patterns in images and produce a large variety of signals in response. Such image recognition systems have been proposed, for instance, in U.S. Pat. No. 6,256,033, for recognising the gestures of a user in range of an imaging system. However, conventional imaging systems have no perception of depth and can produce merely a 2D projection of said user. As a result, the recognition of the user's gestures is inherently flawed, limited in the range of possible inputs and riddled with possible recognition mistakes. In particular, such systems have problems separating the user from its background.
The development of 3D imaging systems, however, offers the possibility to develop shape recognition methods and devices allowing, for instance, better user gesture recognition. One such 3D imaging system was disclosed in G. Yahav, G. J. Iddam and D. Mandelboum, “3D Imaging Camera for Gaming Application”. The 3D imaging system disclosed in this paper is of the so-called “Time-Of-Flight” or TOF type, in which a depth perception is obtained from the shape of a wavefront of light reflected from objects in range of the 3D imaging system. However, other types of imaging systems, such as stereo cameras, LIDAR, radar, sonar, etc. have also been proposed.
It has been proposed, for instance in International Patent Application WO 2008/128568 A1 to capture a 3D image of a scene, to select a subject, such as a human body, in said 3D image, and to segment this subject into a plurality of discrete regions including a head.
In U.S. Pat. No. 7,203,356, it was proposed, among various alternatives, to use ellipse or ellipsoid fitting in order to determine the position of a human head in a source image captured by a 3D imaging system. However, this prior art document does not disclose how the parameters of the ellipse or ellipsoid modelling the head are obtained.
A similar 3D model fitting method has been proposed by Zhengcheng Hu, Tetsuya Kawamura and Keiichi Uchimura in “Grayscale Correlation based 3D Model Fitting for Occupant Head Detection and Tracking”, Stereo Vision, ISBN 978-953-7619-22-0, November 2008, I-Tech, Vienna, Austria, pp. 91-102.
Yet another method using 3D data and ellipse fitting in order to track a human head was proposed by Ehsan Parvizi and Q. M. Jonathan Wu in “Real-Time 3D Head Tracking Based on Time-of-Flight Depth Sensor”, 19th IEEE International Conference on Tools with Artificial Intelligence. However, this paper also failed to disclose how the preferred parameters of the preferred head model were to be obtained.
In “Transformée de Hough elliptique floue rapide”, C. Leignel, O. Bernier, D. Collobert, and R. Seguier disclosed a particularly efficient computer-implemented method for recognising an elliptical contour in an image, and its application for head recognition. In this method, a particular type of elliptical Hough transform is used for recognizing an elliptical shape in a contour image generated from a source image.
A Hough transform is a method for finding in an image an imperfect instance of an object within a certain class by a voting procedure. This voting procedure is carried out in a so-called accumulator array, from which object candidates are obtained as local intensity maxima. The accumulator array is populated by generating, in positions corresponding to that of individual points in the image, instances of the object which is being sought. In the particular case of an elliptical Hough transform, the object is an ellipse. The local intensity maxima in the accumulator array, that is, the positions where a plurality of ellipses intersect, represent candidate positions for a similar ellipse in the image. In the method disclosed by Leignel et al, in order to increase the computing speed, the accumulator array is populated with only representative segments of these ellipses. To increase the detection rate, fuzzy ellipses are used, with, for example, a decreasing intensity distribution around the ideal elliptical shape.
However, without advance knowledge of the expected size of the head in the image, a compromise must be found between computing speed and a likelihood of false positives. To alleviate this problem, in this prior art method only the contours of skin-coloured areas are taken into account. If the user wears skin-coloured clothing, the risk of false positives is however increased. Moreover, this prior art method is limited to detecting human heads within a relatively limited distance range from the imaging system, namely 1 to 2.5 meters.
Other methods of locating a human head in a source depth image are described in published U.S. patent applications US 2005/031166, US 2005/058337 and US 2003/235341.
In addition, Clabian M et al, have published, on the Internet, an article entitled “Head detection and localization from sparse 3D data”, INTERNET CITATION 2002, XP002389335 retrieved from URL:http://www.prip.tuwien.ac.at/˜krw/papers/2002/DAGM/Clabian.pdf, relating to head detection. Krotosky S J et al. have also published an article entitled “Occupant posture analysis using reflectance and stereo images for smart airbag deployment”, INTELLIGENT VEHICLES SYMPOSIUM, 2004 IEEE Parma, Italy, Jun. 14-17, 2004 Piscatawy, N.J., USA, IEEE LNKD-DOI:10.1109NS.2004.1336469, 14 Jun. 2004, pages 698 to 703, XP010727732 ISPB: 978-0-7803-8310-4, that relates to the detection of an occupant of a seat in a vehicle to control the deployment of an airbag.