Recently, lots of devices with a prefix label of smart devices have been launched. Among them, a smart phone and a smart TV form the largest market, and lots of services based on the equipment are also launched. Specifically, a voice recognition system is basically mounted on the equipment for a smart interface to be released to the market. This function is mounted on the equipment so as to be widely utilized as an advertisement that the equipment is provided with the best technology.
However, the most important part in the voice recognition system is to precisely detect a voice uttered from the user. Even though it is impossible to perfectly detect an actual voice of the user from numerous surrounding noises, if a sensor, which is basically mounted on equipment which is currently launched, is efficiently and comprehensively used, an error may be minimized.
Currently, a smart TV or a smart phone is generally attached with several microphones and one or two cameras. By doing this, a voice, which is not affected by the noise, is accepted to remove the noise and to be utilized for voice recognition. A service, which utilizes a technique such as face recognition and gesture recognition using a camera, is also provided.
From the related art, a lip-reading technique based on image recognition has been suggested so that a method, which more precisely detects and recognizes a voice in a noisy situation, has been developed. Most of the methods recognize a face, then recognize lips and tracks changes of the lip motion so as to use information obtained by tracking the changes of the lips as an auxiliary means for voice recognition. However, in a place with an illumination noise, for example, a place where the illumination is dark, it is impossible to track a motion of the lip.
Recently, a lip-reading technique based on voice recognition and image recognition is being studied.