Conventional human-to-computer interfaces generally include hardware control system interfaces, such as, keyboards, mice, remote control and pointing devices. With such interfaces, a physical action needs to be performed with the hardware device itself, for example, touching, moving, holding, pointing, pressing, clicking, or even a plurality of these actions together, sequentially or simultaneously, in a way enabled by these device interfaces so that control commands can be sent to a computer system with which the interface is intended to interact.
More recently, natural interaction systems have appeared, for example, as described in US-A-2011/0115892, wherein conventional two dimensional (2D) cameras are used for capturing light in the visible spectrum and for the detecting a finger of a user. However, due to the limitations of that kind of technology, finger-like objects, for example, a pen, within the captured scene may be incorrectly identified or detected as fingers, or the tracking of the finger may be lost due to dependency to the scene illumination. Advanced imaging processing techniques, however, make it possible to use a conventional camera to detect a hand and to provide an input allowing the analysis of the extremities of the hand. However, using these techniques, it is still not possible to analyse accurately any other extremity present in the 3D scene, and in particular with strong robustness at different distances or strong robustness to background illumination.
In US-A-2012/0069168, colour information is used to find different hand related data or hand parameters, such as, the palm centre and the base of the palm, as well as distances from the palm centre to a contour or extremities of the hand using a mask of the hand extracted from the scene. These distances can even be used to assess whether the hand is closed or open, and, from that assessment, it can be determined if the hand is performing a gesture related to “select” or “grab”. However, such gesture-based methods have their limitations and cannot provide a solid method to solve 3D pointing like interaction with a computer nor be operated in a dark environment wherein colours may not be distinguished. Moreover, the “grab” gesture detected is not very precise since distances provided are only relative measurements and thus cannot be used to “point” and “grab” a virtual object accurately at various distances from the sensor or imaging device. It, moreover, does not provide information, such as, an accurate objective distance in between two independent points of interests in the 3D scene mandatory for obtaining an accurate and reliable “grab” gesture or also mandatory for measuring the level or relative amount of a “pinch” gesture of a hand in the scene.
However, information relating to a third dimension, namely, the depth, is an important addition which can now be determined by using an input from a range sensing camera. Moreover, a range sensing camera may operate, for example, in the infrared spectrum instead of the visible spectrum. Such a range sensing camera provides three-dimensional (3D) information which opens the possibility for having a more robust, stable, reliable and accurate model of the hand of a user as scene capture is independent of the natural illumination of the scene, and as absolute size of objects and distances in between points of interest can be determined whatever is their distance from the image sensing device.
Up to now, robust detection of the hand and the tracking thereof together with tracking of the fingers of the hand or of some other associated points of interest in three dimensions has not been possible. In addition, robust recognition of different kinds of gestures performed sequentially or simultaneously by a single hand or its associated singular points of interest has also not been possible. In particular, there is currently no natural 3D gesture based interaction system which is able to determine a plurality of singular points of interest on at least one single hand in a 3D scene, and, to track these points of interest, enabling pointing and activation gestures to be recognised without false positive detection, even if these gestures are performed at the same time by a single hand to which they are associated.
In the field of graphical user interface (GUI) technologies, the use of an interface based on a pointer is common and the use of touch or multi-touch interfaces has been increasing. Representative input devices using such interfaces based on a pointer include mice and touch screens. Such input devices based on at least one pointer are advantageous in that the manipulation thereof is accurate and commands can clearly be distinguished and transferred to the GUI of an associated computer system, for example, a hardware mouse device simultaneously enabling pointing to, and activation of, a feature using a click button, which provides clear feedbacks to the user about the status of his/her interactions. However, the use of hardware can be disadvantageous as part of the feedback needs to be partially made by contact with the hardware device itself.
In the field of image processing for enabling human-to-computer interactions, several techniques have recently been developed around finger and hand detections, their tracking, their identification, and, in a very limited proportion, around the recognition of their movements in space. Moreover, real-time computer vision-based human finger recognition has mostly been focused on fingerprint recognition and palm print recognition for authentication applications. Furthermore, in order to be able to recognise a human finger in complex backgrounds, tracking finger movement and interpreting finger movements in predefined gestures have conventionally been limited by the capabilities of the imaging system and image signal processing systems supporting the imaging system. One consequence is that no effort has really been carried on the providing of clear unambiguous feedback for hand/finger 3D gesture based natural interactions.
Meanwhile, a natural interaction technique for controlling a pointer by recognising and tracking the 3D motion of a part of the body of a user, for example, the hand or a finger on the hand, is known to demonstrate a relatively low recognition ratio since there is still a need to distinguish clearly between motion corresponding to control from those corresponding to movements which are not linked to the interaction itself. A common technique to solve that problem requires non-intuitive, difficult to use special actions, such as, clear sequentially executed ‘start’ and/or ‘stop’ gestures which are not compliant with efficient single hand simultaneous “pointing” and “activation” gesture recognition. Furthermore, hand or finger gesture-based natural interaction techniques are also problematic as it is still difficult to make the displacement of a pointer attributable to motion of a user in 3D space correspond to the displacement of a mouse from the standpoint of the user. This is particularly true with GUI or interactive systems which are not developed for use in compliancy with natural-based interactions, and, in particular, with interactive systems which are not able to provide feedback to the user performing the natural gesture indicating whether the gesture has been recognised or not. This is quite different to a hardware mouse where the activation button provides a physical click as activation feedback.