1. Technical Field
The present invention relates to a speech recognition device and its operation method and more particularly, to a speech recognition device and its operation method which can improve speech recognition performance for speech data by using multi-sensor data when speech data and multi-sensor data are input from a speech recognition terminal.
2. Description of the Related Art
A conventional method for improving speech recognition performance using information in addition to speech information is a method using both camera image information and speech information.
Audio-visual speech recognition, which uses image processing capabilities in lip reading to aid speech recognition, has been developed for this purpose. Image information has been developed to use for lip reading and to aid audio acoustic models which are noise sensitive. This utilizes additional feature information for acoustic models based on matching speech with lips images.
Audio-visual application is also used as speech endpoint detection method for speech recognition in noisy environment such as inside the car and is also applied to technologies to catch the start and end points of speech by tracing shapes of mouth.
A great deal of development research is currently under way on speech detection using multi-sensors included in a terminal.