1. Field of the Invention
The present invention relates to a continuous sign language recognition apparatus and method. More particularly, the invention relates to a technique of generating reference sign language patterns used for the recognition of continuous sign language patterns through the pattern matching between a continuous sign language and reference sign language. The term "continuous sign language pattern" used in this specification includes also a template pattern.
The present invention also relates to a technique of recognizing a series of continuous sign language patterns which are contained in a sign language and are similar to reference sign language patterns.
The present invention also relates to a sign language translation system in which a recognized sign language is transferred in the system in the form of texts, voices, and sign languages of another type.
2. Description of the Related Art
As conventional techniques regarding sign recognition, there have been proposed "Hand Motion Recognition Apparatus and Sign Language Translation System" in JP-A-2-144675 (first conventional technique) and "Hand Motion Recognition Method using Neuro-computer" in JP-A-3-186979 (second conventional technique). According to the first conventional technique, colored gloves are used to obtain the positional relation between fingers by an image recognition technique. This positional relation is matched with pre-stored finger spelling patterns to recognize each finger spelling. According to the second conventional technique, the correspondence between finger shape data inputted from glove-like means and the meaning of the finger shape is learnt by a neural network, and an output obtained when inputting the finger shape data is inputted to the network is used as the recognized finger spelling.
A reference pattern to be used for the matching between continuous patterns and reference patterns has been obtained heretofore by linearly normalizing sample patterns for the reference pattern in the time axis direction and by simply averaging these normalized sample patterns.
"Motion Recognition Apparatus using Neurocomputer" has also been proposed in JP-A-4-51372 (third conventional technique). According to the third conventional technique which is an improved version of the second conventional technique; a three-dimensional motion is time sequentially detected by using a recurrent type neural network to recognize the meaning of one motion (e.g., of a sign language).
A continuous DP matching scheme has also been proposed as the recognition method through the matching between time sequential continuous patterns and reference patterns ("Continuous Word Recognition using Continuous DP", by OKA, the Speech Study Group of Acoustical Society of Japan, S78-20, pp.145 to 152, 1978) (fourth conventional technique). According to the fourth conventional technique, continuous voice patterns are sequentially matched with reference voice patterns while moving the latter in the time axis direction to recognize reference voice patterns contained in the continuous voice patterns. This matching result is a time sequence of similarities between the continuous patterns and reference patterns. A minimum value of the similarities at a threshold level or lower is searched from the similarity time sequence for each reference pattern, and the time at the minimum value is used to identify a reference pattern candidate.
The first and second conventional techniques for the sign language recognition are mainly directed to finger spellings which are generally static patterns. It is therefore unable to recognize a usual sign language with complicated motions of fingers and hands.
If a reference pattern is obtained from an average of sample sign language patterns by linearly expanding/compressing in the time domain and normalizing them without considering nonlinear expansion/compression, the resultant reference pattern becomes damped and does not reflect the characteristics of original sample patterns as shown in FIG. 4A.
With the third conventional technique, time sequential data of more or less dynamic can be recognized by using a recurrent type neutral network. Although it is possible to recognize time sequential data representing one motion, it is difficult to properly cut out each sign language word from a series of words which are often used in a practical sign language.
With the fourth conventional technique for the matching between continuous patterns and reference patterns, if data of both the continuous patterns and reference patterns sampled at a predetermined timing is used as it is, the time required for the matching increases in proportion to the length of the continuous patterns and the number of reference patterns.
Other issues associated with sign language translation are as follows.
(1) There is a difference, between persons, of finger shapes, hand positions, and their motions. According to the teaching of voice recognition, voices of a known person can be recognized more easily than voices of unknown persons. In the case of finger spellings, the number of finger spellings is as small as 50 words. It is therefore possible to register finger spellings of a particular person, or to learn the weight coefficients of a neural network dedicated to a particular person. However, in the case of a sign language, the number of basic words is as large as 1000 words or more. Therefore, the registration or learning for a particular person is impossible. PA1 (2) Generally, the case where a reference to or a storage of past conversations of a sign language is desired, occurs frequently. However, this function has not been realized as yet. PA1 (3) There are less sign words accompanied with emotions. From this reason, the facial expression or large body motion has been used. However, a normal person generally concentrates on the recognition of a sign language only, and the facial expression or large body motion is often disregarded. Accordingly, in order to realize a natural speech, it is necessary for a sign language translation system to provide a function of translating a sign language with emotion.