1. Field of Invention
The present invention relates to a behavior recognition system and a behavior recognition method and more particularly to a behavior recognition system and a behavior recognition method by combining an image and a speech, which are applicable to recognize a correct behavior through a sequence corresponding relation between the image and the speech.
2. Related Art
FIG. 1A is a schematic view of image recognition in the prior art, and FIG. 1B is a schematic view of speech recognition in the prior art.
In the prior art, the recognition technology includes image recognition and speech recognition. Taking the image recognition technology as an example, a plurality of image samples is stored in a recognition host 2. A camera module 11 shoots a gesture of a user to generate a gesture image, and matches the gesture image with the image samples, so as to obtain an execution instruction corresponding to the gesture image. Moreover, image feature extraction is performed on the whole gesture image, so as to enhance a recognition rate of the gesture image through a feature value comparison technology.
As for the speech recognition technology, a plurality of speech samples is stored in the recognition host 2. A microphone 12 receives a sound made by the user to generate a speech data, and matches the speech data with the speech samples, so as to obtain an execution instruction corresponding to the speech data. Moreover, speech feature extraction is performed on the whole speech data, so as to enhance a recognition rate of the speech data through the feature value comparison technology.
In order to enhance the recognition rate, some manufacturer further proposes a recognition technology by combining a gesture image and a speech data. However, as for the image recognition technology, although a current image recognition system is combined with the image feature extraction technology, a problem about feature extraction errors caused by repetitive gesture images is not considered, so that the recognition rate is not increased but decreased instead. Next, if the image recognition technology is not used together with the speech recognition technology, once the gesture image recognition fails, the recognition system cannot derive intentions of human behaviors and motions correctly. Similarly, if the speech recognition technology is not used together with the image recognition technology, once the speech data recognition fails, the recognition system also cannot derive intentions of human behaviors and motions correctly. However, the recognition technologies by combining the gesture image and the speech data usually combine the gesture image and the speech data in a linear manner. Once the recognition system fails to recognize either the image or the speech due to external factors (for example, the speech data includes excessive noises, or the gesture image includes excessive light source interferences, or an abnormal feature data is extracted), an incorrect recognition result occurs during the linear combination of the gesture image and the speech data.
Therefore, the manufacturers consider how to reduce influences on the recognition system caused by external interference factors, even how to reduce situations that the recognition system extracts abnormal features, and to enhance a recognition rate for human behaviors and motions.