The present invention relates to a sign language telephone device to be used in a case where an aurally handicapped person talks with a normal person in a distant place who does not know the sign language.
The sign language has been developed to contrive the communication between aurally handicapped persons. By using the sign language, an aurally handicapped person is able to converse directly with another aurally handicapped person being close to him or her with hand gestures, body gestures, face expressions, etc. In a case of the communication between aurally handicapped persons being apart from each other, the transmission of will was possible in realtime by performing sign language gestures using videophone devices.
On the other hand, recently, researches on sign language translation system have been actively performed so that an aurally handicapped person who uses sign language is able to converse with a normal person who does not know the sign language (Reference: Masaru Oki, Hirohiko Sagawa, Tomoko Sakiyama, Eiji Ohira, Hiromichi Fujisawa: Information Processing Media Research Society, 15-6, Information Processing Society of Japan, 1994). The sign language translation system is composed of a sign-language-to-Japanese-translation-subsystem and a Japanese-to-sign-language-translation-subsystem.
(1) The sign-language-to-Japanese-translation-subsystem is composed of a sign language recognition unit which recognizes the sign language and translates it to a sign language word train, and a sign-language-to-Japanese-translation-unit which translates the recognized sign language words to Japanese. In the sign language recognition unit, the gestures of hands are inputted using a glove-based input, the input hand gesture is compared with a standard hand gesture and a sign language word which has the closest standard hand gesture is selected. The sign-language-to-Japanese-translation-unit translates a sign language word train to Japanese using a correspondence table between sign language words and Japanese words and a conversion rule from a sign language sentence to a Japanese sentence.
(2) Japanese-to-sign-language-translation-subsystem is composed of Japanese to the sign language translation unit which translates Japanese to the sign language, and a sign language generation unit which displays the sign language as an animation using 3 dimensional computer graphics. The Japanese-to-sign-language-translation-unit analyzes Japanese and translates Japanese to a sign language word train using a correspondence table between Japanese words and the sign language words and a conversion rule from Japanese sentences to sign language sentences. The sign language generation unit generates sign language animations using a (sign language words)-(animation data) dictionary which stores sets of indexes of sign language words and the corresponding data of gestures of hands or countenances which are registered beforehand. In the generation of a sign language animation, the sign language animation data corresponding to the sign language words in a sign language word train are retrieved, and a human body model moves based upon the retrieved data. The movement of the model is made to be seen continuous by interpolating the gaps between the sign language words.
However, the sign language translation system is basically developed for the direct communication between an aurally handicapped person and a normal person being close to each other, so that it is not shown how to simply apply the configuration for a long distance call (conversation).
If the conventional sign language translation system is enlarged to apply to a long distance call, several controversial points will be produced.
In the first place, there will be a problem which makes the configuration of a device a large scaled and complicated one. To begin with, the above-mentioned sign language translation system is supposed to be a stand-alone type system, and in a case where it is enlarged to be applied to a long distance call, as an ordinary form, the following form can be considered: the sign-language-to-Japanese-translation-subsystem and the Japanese-to-sign-language-translation-subsystem are separately composed and these systems are connected to each other through a network.
However, in the case of the sign-language-to-Japanese-translation-subsystem and the Japanese-to-sign-language-translation-subsystem in a conventional sign language translation system, the dictionary data base or the correspondence table between the sign language words and Japanese words in the sign-language-to-Japanese-translation-unit (Japanese-to-sign-language-translation-unit) are commonly used in order to economize in the storage capacity.
For example, for the sake of long distance calls, if the sign-language-to-Japanese-translation-subsystem and the Japanese-to-sign-language-translation-subsystem are made to be separated and independent from each other and the sign-language-to-Japanese-translation-subsystem is provided on the side of an aurally handicapped person and the Japanese-to-sign-language-translation-subsystem is provided on the side of a normal person, then the identical data for translation have to be provided in duplication, which will naturally make the device configuration a large scaled and complicated one.
In the second place, there is another problem in that it is difficult to use an existing network for long distance calls (conversation). In a case where the sign-language-to-Japanese-translation-subsystem is provided on the aurally handicapped person side and the Japanese-to-sign-language-translation-subsystem is provided on the normal person side, it is necessary to transmit translated Japanese sentences or sign language animations to the other subsystem with each other. In particular, the transmission of sign language animations accompanies the transmission of a large quantity of images, so that for the execution of long distance calls enough preparations of the infrastructure of the network is needed, the network which is able to cope with the high speed transmission of a large capacity of data. Image transmission is possible with the present videophone facilities; however, in the case of the sign language, unless the subtle form and movement of hands, etc. are accurately transmitted and displayed, misunderstandings or erroneous recognition may be caused, which may give occasion to a trouble in communication.
Therefore, up to now, for an aurally handicapped person who uses the sign language, there has been no means to have conversation easily with a normal person in a distant place who does not know the sign language. Accordingly, they communicated to each other in transmitting characters or pictures using facsimile.
Therefore, for an aurally handicapped person who wants to talk with the sign language, there have been some troubles to communicate with a normal person in a distant place who does not know the sign language.
The purpose of the present invention is to offer a simple device with which an aurally handicapped person who uses the sign language is able to communicate with a normal person in a distant place who does not know the sign language.
Another purpose of the present invention is to offer a device which makes an aurally handicapped person who uses the sign language possible to communicate with a normal person in a distant place who does not know the-sign language through an existing network.
The present invention proposes a new concept called a sign language telephone device. In short, the present invention allows an aurally handicapped person who uses the sign language to communicate with a normal person in a distant place who does not know the sign language using the infrastructure of the existing videophone facilities. In the case of the present invention, the videophone on the side of an aurally handicapped person who uses the sign language is provided with both sign-language-to-Japanese-translation-function and Japanese-to-sign-language-translation-function and it is connected to the videophone on the side of a normal person through a network.
In the present invention, a videophone device having a sign language translation function to be used by an aurally handicapped person (sign-language-to-Japan-translation-function and Japanese-to-sign-language-translation-function) is called a sign language telephone device, and an ordinary videophone device used by a normal person is called a videophone device on the normal person side. The present invention makes it possible to have conversation between a sign language telephone device and a videophone device on the side of a normal person in performing sign language translation.
The framework of the whole system according to the present invention is fundamentally constituted with 3 elements, a sign language telephone device, a network and a videophone device; however, one of the features of the present invention is in that various functions are concentrated in the sign language telephone device.
The sign language telephone device comprises several characteristic means such as a sign language input means, a videophone connection means, the sign-language-to-Japanese-translation-subsystem, and the Japanese-to-sign-language-translation-subsystem, besides a TV set, camera, microphone, and videophone control device which are found in an ordinary videophone device.
Supposing a case where an aurally handicapped person actually calls a normal person in a distant place on a sign language telephone device, the fundamental operation of the present invention will be explained.
An aurally handicapped person dials the telephone number of a normal person on the other end of the line, and when the normal person comes to the phone, the aurally handicapped person starts to communicate with him. In that case, the aurally handicapped person inputs the sign language through the sign language input means in the sign language telephone device, and the input sign language is recognized by the sign-language-to-Japanese-translation-subsystem and translated to a sign language word train and further translated to Japanese. The translated Japanese is outputted to the videophone device on the side of a normal person through a videophone connection means and a network (public network) as a synthesized voice. On the videophone device on the side of a normal person, an actual image inputted by a camera in the sign language telephone device on the side of an aurally handicapped person is displayed. In the case where voice is synthesized, corresponding to the aurally handicapped person, the voice can be adjusted: man""s voice or woman""s voice, quality of voice, speed of speaking, loudness of voice, high voice or low voice, etc. can be selected. In the case of a female aurally handicapped person, naturally female voice is desirable as a synthesized voice. In the case of a young person, a high tone voice might be desirable. The tones of Japanese voice, which is the result of translation of the sign language of an aurally handicapped person, can be used to specify an aurally handicapped person.
On the side of a normal person, the response is given to a videophone device with voice, and the voice transmitted through a network (a public network) a videophone connection device in a sign language telephone device is recognized in the Japanese-to-sign-language-translation-subsystem and the recognized Japanese is translated to the sign language, and the translated sign language is expressed as a sign language animation and displayed on the TV set.
The above-mentioned procedures are repeated, and the aurally handicapped person responses in the sign language, and the normal person basically responses in voice. In the case where a normal person calls an aurally handicapped person on a videophone, the procedures are almost the same as the above-mentioned case except the way of dialing at first.
In the case of the communication using the sign language as described in the above, some more contrivances are necessary.
In a first place, in the sign-language-to-Japan-translation-subsystem, it is made possible to select a translation mode or a non-translation mode.
For an aurally handicapped person sign language is the means of communication, so that there is a fear that all hand gestures may be recognized as the gestures for communication. While a sign language telephone device is being used, the movement of hands not included in the sign language, for example, the movement of a hand for drinking coffee, may be recognized as a gesture in the sign language. In contrast to this, in the translation mode in the present invention, the movement of hands are translated to the sign language, but in the non-translation mode the movement of hands is not translated. The methods of changeover between the translation mode and non-translation mode are shown below.
(1) A method performed with a button,
(2) A method in which non-translation mode is selected when the face is not looking forward,
(3) A method in which the translation mode and the non-translation mode can be changed over by performing a predetermined special hand gesture,
(4) A method in which the non-translation mode is selected when at least a hand is placed at the home position, etc. can be considered.
In a second place, at the videophone device on the side of a normal person, not only the actual image but also the animation can be displayed. When an aurally handicapped person talks with a normal person whom the aurally handicapped person does not know well, in most cases the aurally handicapped person is reluctant to show his or her actual image. In particular, in the case of a female person, in many cases she feels resistance to show her actual image when the call is from a stranger. Therefore, the sign-language-to-Japanese-translation-subsystem comprises a conversion means to convert the input hand gesture data to a sign language animation using the hand gestures inputted from the sign language input means and the expressions of the face which is taken in from a camera and recognized. In the image mode, the actual image data from the camera are displayed and in the animation mode, the sign language animation is displayed for the protection of privacy.
In a third place, the display on the sign language telephone device on the side of an aurally handicapped person and the display on the videophone device on the side of a normal person are synchronized. It takes time to translate the sign language of an aurally handicapped person to Japanese. Thereby, there is probability that the actual image of an aurally handicapped person and the voice and character train of Japanese translated from the sign language of an aurally handicapped person are displayed discontinuously and asynchronously on the time axis on the screen of a videophone device on the side of a normal person. The present invention comprises a means to display in making them synchronized.
It also takes time to recognize the spoken Japanese of a normal person, convert it to a character train, and convert it to a sign language animation. Thereby, there is probability that the actual image of the normal person sent to the screen of the sign language telephone device on the side the aurally handicapped person and the displayed sign language animation obtained in translating the spoken Japanese of the normal person are displayed discontinuously and asynchronously on the time axis. The present invention comprises a means to display in making them synchronized.
To be concrete, the actual image is given a time stamp and the time stamp is adjusted to a time stamp given to the translated and displayed image for synchronization.
For example, in a case of direct conversation without using sign language telephone device, the periods of time needed are as shown below:
0.0 sec to 2.0 sec [sign language] Good morning!,
2.0 sec to 5.0 sec [sign language] How are you?,
5.5 sec to 8.0 sec [voice] I""m fine.
When the conversation of an aurally handicapped person by way of the sign language comes to a stop, the conversation with the voice of a normal person is started. Assuming that the translation in the sign language telephone device is started after the finish of conversation, the result of translation of the sign language conversation of 0.0 sec to 2.0 sec is, for example, delivered to the videophone as a synthesized voice during 2.0 sec to 4.0 sec.
0.0 sec to 2.0 sec [sign language] Good morning!
2.0 sec to 4.0 sec [synthesized voice] Good morning!
2.0 sec to 5.0 sec [sign language] How are you
4.0 sec to 7.0 sec [synthesized voice] How are you?
7.0 sec to 10.0 sec [voice] I""m fine.
10.0 sec to 13.0 sec [sign language animation] I""m fine.
If the actual image is transmitted to the videophone device on the side of a normal person without synchronizing, the gesture xe2x80x9cGood morning!xe2x80x9d is sent at first and at the time when the gesture xe2x80x9cHow are youxe2x80x9d is performed the sign language xe2x80x9cGood morningxe2x80x9d is displayed as a synthesized voice being translated, and the actual image and the synthesized voice, a result of translation, are deviated in point of time from each other. It gives a receiver some incongruous feeling, so that it is desirable that the actual image and the synthesized voice, a result of translation, are synchronized. In the present invention, the actual image and the conversation in the sign language are recorded together with time and the result of translation of sign language is given the time when the sign language is actually performed. In order to make the time coincide with time in real time, the actual image and the synthesized voice are synchronized and after that they are transmitted to the videophone device on the side of the normal person. About the actual image and voice sent from the side of a normal person, in the similar way, time is recorded. When the voice is recognized and translated to the sign language, and displayed as an animation, the actual image and the sign language animation are synchronized by utilizing time and then they are displayed. When they are synchronized, in some cases, the shortage of actual images or of time to display occurs. In the present invention, when the actual images to display are not enough, a still picture at the time when the shortage is made clear is displayed. When the display time is not enough, rapid traverse of the actual image is performed or display is not displayed.
In a fourth place, as a means for a response message to a telephone call when no body is in, the present invention comprises a means to prepare a message in combining some selected out of voices, images, characters or sign language animations. In this case too, as mentioned in the second feature, it is an effective way to prepare response when no body is in using an animation without using an actual image from the point of view of protecting privacy.
In a fifth place, the present invention comprises a means to display characters, which are the results of recognition of the voice of a normal person, together with a character train which is obtained by translating the result of recognition of sign language into Japanese on the videophone on the side of a normal person. Thereby, on the side of a normal person, it is made possible to confirm whether the contents of conversation spoken by him or her is correctly transmitted to a sign language telephone device or not.
The further objects or configuration will be made clear with the explanation about the embodiments shown in the following.