1. Field of the Invention
The present invention relates to methods and devices for segmenting hand gestures, more specifically to a method and device for automatically segmenting hand gestures for sign language, for example, into words when recognizing the hand gestures.
2. Description of the Background Art
In recent years, pointing devices have allowed for easy input in personal computers, for example, and thus are becoming popular among users not only for professional use because they eliminate complicated keyboard operation.
Further, with the technology of automatically recognizing a user's voice being lately developed, voice-inputting-type personal computers and home electrical appliances equipped with voice-instructing-type microcomputers have appeared on the market (hereinafter, such personal computer or home electrical appliance equipped with a microcomputer is referred to as a computer device). Supposing this technology sees further progress, input operation for the computer device may be approximated to a manner observed in interpersonal communication. Moreover, users who have difficulty in operating with hands may easily access the computer device thanks to the voice-inputting system.
People communicate with each other by moving their hands or heads, or changing facial expressions as well as talking. If the computer device can automatically recognize such motions observed in specific parts of the body, users can handle input operation in a manner rather similar to interpersonal communication. Further, users who see difficulty in operation with voice can easily access the computer device using sign language. The computer device can also be used to translate sign language.
In order to respond to such a request, such a computer device that recognizes the motions observed in the user's specific parts of body, including hand gestures for sign anguage, has been developed by the Assignees of the present invention and others. The processing executed in such a conventional computer device to recognize the hand gestures for sign language is as follows:
First, a user is photographed, then his/her image is stored. Second, a part of the image is specified as a hand(s). Thereafter, motions of the hand(s) are detected, and then any word for sign language matching the detected motions is specified by referring to any dictionary telling how gestures for sign language are made. In this manner, the computer device "recognizes" the user's sign language.
Hereinafter, as to the aforementioned procedure, a process executed to specify words for sign language in accordance with the motions of hands is described in more detail.
Every word for sign language is generally structured by several unit gestures or combinations thereof. The unit gesture herein means a dividable minimum gesture such as raising, lowering, or bending. Assuming that the unit gestures are A, B, and C, words for sign language may be represented in such manner that (A), (B), (C), . . . , (A, B), (A, C), (B, C), . . . , (A, B, C), . . . People talk by sign language by combining these words for sign language.
Supposing that the word for sign language (A) means "power", and the word for sign language (B, C) means "cutting off", a meaning of "cutting off power" is completed by expressing the words for sign language (A) and (B, C), that is, by successively making the unit gestures of A, B, and C.
In face-to-face sign language, when a person who talks by sign language (hereinafter, signer) successively makes the unit gestures A, B, and C with the words for sign language (A) and (B, C) in mind, his/her partner can often intuitively recognize the series of unit gestures being directed to the words for sign language (A) and (B, C). On the other hand, when sign language is inputted into the computer device, the computer device cannot recognize the series of unit gestures A, B, and C as the words for sign language (A) and (B, C) even if the user successively making the unit gestures of A, B, and C with the words for sign language (A) and (B, C) in mind.
Therefore, the user has been taking a predetermined gesture such as a pause (hereinafter, segmentation gesture a) between the words for sign language (A) and (B, C). To be more specific, when the user wants to input "cutting off power", he/she expresses the words for sign language (A) and (B, C) with the segmentation gesture a interposed therebetween, that is, the unit gesture A is first made, then the segmentation gesture a, and the unit gestures B and C are made last. The computer device then detects the series of gestures made by the user, segments the same before and after the segmentation gesture a, and obtains the words for sign language (A) and (B, C).
As is known from the above, in the conventional gesture recognition method executed in the computer device, the user has no choice but to annoyingly insert a segmentation gesture between a hand gesture corresponding to a certain word and a hand gesture corresponding to another that follows every time he/she inputs a sentence structured by several words into the computer device with the hand gestures for sign language. This is because the conventional gesture recognition method could not automatically segment gestures to be detected into words.
Note that, a method of segmenting a series of unit gestures (gesture code string) to be detected into words may include, for example, a process executed in a similar manner to a Japanese word processor in which a character code string is segmented into words, and then converted into characters.
In this case, however, the gesture code string is segmented by referring to any dictionary in which words are registered. Therefore, positions where the gesture code string is segmented are not uniquely defined. If this is the case, the computer device has to offer several alternatives where to segment to the user, and then the user has to select a position best suited to his/her purpose. Accordingly, it gives the user a lot of trouble and, at the same time, makes the input operation slow.
In a case where a dictionary incorporated in the computer device including words for sign language (A), (B), (C), . . . , (A, B), (A, C), (B, C), . . . , (A, B, C), . . . is referred to find a position to segment in the unit gestures A, B and C successively made by the user with the words for sign language (A) and (B, C) in mind, the position to segment cannot be limited to one. Therefore, the computer device segments at some potential positions to offer several alternatives such as (A) and (B, C), (A, B) and (C), or (A, B, C) to the user. In response thereto, the user selects any one which best fits to his/her purpose, and then notifies the selected position to the computer device.
As is evident from the above, such segmentation system based on the gesture code string is not sufficient to automatically segment the series of unit gestures to be detected.
Therefore, an object of the present invention is to provide a hand gesture segmentation method and device for automatically segmenting detected hand gestures into words, when recognizing the hand gestures, without the user's presentation of where to segment.