1. Field of the Invention
The present invention relates generally to human-machine interaction, and more particularly to recognizing hand signs in sign language.
2. Description of Background Art
With the advent of super-fast computing systems and highly efficient digital imaging systems, the field of computer vision based man-machine interaction has undergone a period of significant technological advancements. From simple motion detection systems where motion triggers a response from a machine (e.g., surveillance systems) to highly complex three-dimensional (“3D”) imaging sign recognition systems have been the subject of significant development in the last few years. For example, in the area of sign based human-machine communications, the recognition of human sign language has been a subject of much study lately as a promising technology for man-machine communications. Other sign recognition systems and even more complex gesture recognition systems have been developed based on various methods to locate and track hands and their motion with respect to other body parts (e.g., arms, torso, head, and the like).
These conventional techniques for sign and gesture recognition generally require markers, specific colors, backgrounds or gloves to aid the machine vision system in finding the source of the sign or gesture. For example, some conventional approaches for hand detection use color or motion information to determine the image region that corresponds to the hand or hands gesturing to the system. In these approaches, tracking hand motion is highly unreliable under varying lighting conditions. Some systems use special equipment such as gloves, while some others use a background with specific color to make the task feasible.
Another group of conventional techniques uses depth images generated by stereo vision cameras or time-of-flight sensors for sign and gesture recognition. The depth images are analyzed to extract image pixels representing hands and arms. The extracted image pixels are then further processed and matched with stored hand shapes to recognize hand signs represented by the image pixels representing the hands and arms. The trajectory of the hands and arms may also be tracked to determine gestures and sign language represented by motions and shapes of the hands in a sequence of images. The conventional techniques using the depth images are advantageous compared to the techniques requiring markers, specific colors, backgrounds or gloves because it is more convenient to implement in uncontrolled real-life environment. Further, the conventional techniques using depth images also have the advantage that additional equipments or devices need not be provided to a person communicating via the sign language or installed around that person.
Conventional hand sign recognition systems using the depth images, however, may recognize hand signs that only involve one hand or two hands that do not adjoin or overlap. The depth images are generally gray-scale images with image pixels indicating the distance between the cameras and a target subject. In such depth images, ambiguity as to which image pixels represent which objects (e.g., hands) arises when two or more objects (e.g., hands) overlap or adjoin in the depth images.
For example, FIGS. 1A to 1C illustrate three examples from Japanese Sign Language (JSL) that involve overlapping or adjoining hands. FIG. 1A illustrates a sign which means a “letter,” FIG. 1B illustrates a sign which means “well,” and FIG. 1C illustrates a sign that means “fit” in JSL. In conventional sign recognition systems using the depth images, such overlapping or adjoining hand shapes and motions may not be recognized or identified because conventional sign recognition systems do not have the capability to separate image pixels representing one hand from the other when the hands are overlapped. Therefore, conventional sign recognition systems using depth images can recognize only a portion of signs that are generally used in sign language.