1. Field of the Invention
The present invention relates to a method of and an apparatus for using video signals along with audio signals of speech to provide speech recognition, within an environment where speech recognition is necessary. In particular, the present invention relates to a method of and a system for using video images of lip movements with audio input signals to improve speech recognition. The present invention can be implemented in a hand held device, and the invention may include discrete devices or may be implemented on a semiconductor substrate such as a silicon chip.
2. Description of the Related Art
Human speech is made up of numerous different sounds and syllables. Often in many languages, different sounds and/or syllables are combined to form words and/or sentences. The combination of the sounds, syllables, words, and sentences forms the basis for oral communication.
Generally, human speech is recognizable if the speech is clear and comprehensible to another human's ears. On the other hand, human speech can be recognizable by a machine if the audio waves of the speech is received, and the audio waves are recognizable by an algorithm operating within the machine. Although audio speech recognition by machines has advanced in sophistication, the accuracy of audio speech recognition has room for improvements.