With the advent of super-fast computing systems and highly efficient digital imaging systems, the field of computer vision based man-machine interaction has undergone a period of significant technological advancements. From simple motion detection systems where motion triggers a response from a machine (e.g., surveillance systems) to highly complex three-dimensional (“3D”) imaging sign recognition systems have been the subject of significant development in the last few years. For example, in the area of sign based human-machine communications, the recognition of human sign language has been a subject of much study lately as a promising technology for man-machine communications. Other sign recognition systems and even more complex gesture recognition systems have been developed based on various methods to locate and track hands and their motion with respect to other body parts (e.g., arms, torso, head, and the like).
These conventional techniques for sign and gesture recognition generally require markers, specific colors, backgrounds or gloves to aid the machine vision system in finding the source of the sign or gesture. For example, some conventional approaches for hand detection use color or motion information to determine the image region that corresponds to the hand or hands gesturing to the system. In these approaches, tracking hand motion is highly unreliable under varying lighting conditions. Some systems use special equipment such as gloves, while some others use a background with specific color to make the task feasible.
In these conventional systems, even when the hand location can be determined and tracked across image frames, analyzing the shape of the hand to determine the sign being provided is still a very difficult task. The shape recognition task becomes even more difficult when attempting to analyze the shape based on real time processing of the image data. To improve efficiency of the hand shape analysis process, a magnified view of the hand region can be used by focusing the image capturing device to the appropriate region in the scene. However, any body, arm, head, or other posture information might be missed by falling out of the imaging frame.
To capture whole body posture information in the image data some depth capable systems use multiple cameras as a means to determine depth information. Images captured from different angles provide sufficient data to extract 3D information of the subject. To understand the 3D volume obtained, 3D model fitting has also been proposed, yet it is a complex and computation-intensive process, frequently unstable and generally not fit for real-time applications. Stereo vision, which is a popular choice in depth capable systems, usually does not provide image resolution sufficient for hand shape analysis due to lack of texture on the subject. Thus, depth has primarily been used to detect a limited set of simple gestures such as large pointing motions. Some approaches use high-resolution 3D data obtained by using coded lighting for static hand posture analysis. However, these high-resolution systems require massive computational and storage resources generally unavailable or not capable of real-time performance.
Accordingly, what is needed is an approach to computer based gesture and sign recognition for human-computer interaction that (i) can analyze multiple features simultaneously including hand location, hand shape, movement, orientation, speed, and the like, (ii) can recognize gestures without using any special background, marks, or gloves; (iii) be based on computationally inexpensive computation module to achieve real-time gesture recognition; and (iv) that is less susceptible to illumination changes.