Hand shape and gesture recognition has been an active area of investigation during the past decade. Beyond the quest for a more “natural” interaction between humans and computers, there are many interesting applications in robotics, virtual reality, tele-manipulation, tele-presence, and sign language translation. According to the American Sign Language Dictionary, a sign is described in terms of four components: hand shape, location in relation to the body, movement of the hands, and orientation of the palms. Hand shape (position of the fingers with respect to the palm), the static component of the sign, along with the orientation of the palm, forms what is known as “posture”. A set of 26 unique distinguishable postures makes up the alphabet in ASL used to spell names or uncommon words that are not well defined in the dictionary.
While some applications, like image manipulation and virtual reality, allow the researcher to select a convenient set of postures which are easy to differentiate, such as point, rotate, track, fist, index, victory, or the “NASA Postures”, the well-established ASL alphabet contains some signs which are very similar to each other. For example, the letters “A”, “M”, “N”, “S”, and “T” are signed with a closed fist. The amount of finger occlusion is high and, at first glance, these five letters can appear to be the same posture. This makes it very hard to use vision-based systems in the recognition task. Efforts have been made to recognize the shapes using the “size function” concept on a Sun Sparc Station with some success. Some researchers achieved a 93% recognition rate in the easiest (most recognizable letters), and a 70% recognition rate in the most-difficult case (the letter “C”), using colored gloves and neural networks. Others have implemented a successful gesture recognizer with as high as 98% accuracy.
Despite instrumented gloves being described as “cumbersome”, “restrictive”, and “unnatural” for those who prefer vision-based systems, they have been more successful recognizing postures. The Data Entry Glove, described in U.S. Pat. No. 4,414,537 to Grimes, translates postures to ASCII characters to a computer using switches and other sensors sewn to the glove.
In a search of more-affordable options, a system for Australian Sign Language based on Mattel's Power Glove was proposed, but the glove could not be used to recognize the alphabet hand shapes because of a lack of sensors on the little finger. Others mounted piezo-resistive accelerometers on five rings for a typing interface, and some used accelerometers at the fingertips to implement a tracking system for pointing purposes. These gloves have not been applied to ASL finger spelling.
American Sign Language (ASL) is the native language of some 300,000 to 500,000 people in North America. It is estimated that 13 million people, including members of both the deaf and hearing populations, can communicate to some extent in sign language just in the United States, representing the fourth most used language in this country. It is, therefore, appealing to direct efforts toward electronic sign language translators.
Researchers of Human-Computer Interaction (HCI) have proposed and tested some quantitative models for gesture recognition based on measurable parameters. Yet, the use of models based on the linguistic structure of signs that ease the task of automatic translation of sign language into text or speech is in its early stages. Linguists have proposed different models of gesture from different points of view, but they have not agreed on definitions and models that could help engineers design electronic translators. Existing definitions and models are qualitative and difficult to validate using electronic systems.
As with any other language, differences are common among signers depending on age, experience or geographic location, so the exact execution of a sign varies but the meaning remains. Therefore, any automatic system intended to recognize signs has to be able to classify signs accurately with different “styles” or “accents”. Another important challenge that has to be overcome is the fact that signs are already defined and cannot be changed at the researcher's convenience or because of sensor deficiencies. In any case, to balance complexity, training time, and error rate, a trade-off takes place between the signer's freedom and the device's restrictions.
Previous approaches have focused on two objectives: the hand alphabet which is used to fingerspell words, and complete signs which are formed by dynamic hand movements.
The instruments used to capture hand gestures can be classified in two general groups: video-based and instrumented. The video-based approaches claim to allow the signer to move freely without any instrumentation attached to the body. Trajectory, hand shape and hand locations are tracked and detected by a camera (or an array of cameras). By doing so, the signer is constrained to sign in a closed, somehow controlled environment. The amount of data that has to be processed to extract and track hands in the image also imposes a restriction on memory, speed and complexity on the computer equipment.
To capture the dynamic nature of hand gestures, it is necessary to know the position of the hand at certain intervals of time. For instrumented approaches, gloves are complemented with infra-red, ultrasonic or magnetic trackers to capture movement and hand location with a range of resolution that goes from centimeters (ultrasonic) to millimeters (magnetic). The drawback of these types of trackers is that they force the signer to remain close to the radiant source and inside a controlled environment free of interference (magnetic or luminescent) or interruptions of line of sight.
A number of sign language recognition apparatus and gesture recognition systems have been proposed. Examples of these prior devices are disclosed in U.S. Pat. No. 5,887,069 to Sakou et al., U.S. Pat. No. 5,953,693 to Sakiyama et al., U.S. Pat. No. 5,699,441 to Sagawa et al., U.S. Pat. No. 5,714,698 to Tokioka et al., U.S. Pat. No. 6,477,239 to Ohki et al., and U.S. Pat. No. 6,304,840 to Vance et al.
While a number of these prior apparatus have been successful for their intended purpose, there is a continuing need for improved sign language recognition systems.