This invention relates to the field of data collection for human speech recognition and to the use of a dual-channel audio and photo system for accomplishing such data collection.
Most present day speech recognition systems function by converting acoustic sound waves generated by human utterances into analog or digital data using special algorithms which consider only the audio information. There is, however, an additional source of information for speech recognition other than this audio signal which can be of significant benefit to the accuracy and speed of the speech recognition process. Deaf people, who are trained lip-readers, use this information by observing visual cues produced by the mouth and surrounding areas of a speaker. By way of the present invention, this same information is available in an improved format to an Automatic Speech Recognition (ASR) system and is believed to offer increased speech recognition accuracy and operating rates.
ASR is believed to offer a significant improvement in military environments including the control of a manned aircraft. The present human-machine interface in the cockpit of an aircraft appears to be nearing an upper limit of human capability since it is based on manual acts performed by the aircraft crew members and the time for performing such manual acts can be severely limited especially under combat conditions. Voice-controlled avionics will allow the pilot to command his/her aircraft simply by talking a manual control system the time requirement and interference with other activities imposed by a manual control system.
ASR can also be effectively used in the office or industrial environment especially in connection with computer and automatic data systems where, according to present day technology, the keyboard is the major avenue of communication from human to computer.
The lack of accuracy and reliability in presently available speech recognition equipment is a major reason for nonuse of speech recognition systems in these applications. By way of the improved data collection arrangement of the present invention, an addressing of this accuracy and reliability difficulty is believed possible.
The patent art includes a number of examples of combined photo and audio speech recognition systems. Included in this patent art is the U.S. Pat. No. 3,192,321 of E. G. Nassimbene, concerned with a headset having both microphone and photo pick-ups. Since the Nassimbene apparatus contemplates only a DC coupled photo signal collection and processing system, a ready distinction from the present invention is discernible.
The patent art of interest also includes U.S. Pat. No. 4,769,845 issued to H. Nakamura, concerned with a lip image speech recognition apparatus which employs a camera device in order to achieve data input. The non-camera or integrated image pick-up arrangement of the present invention is believed distinguished over this camera input nature of the Nakamura patent.
Also included in this patent art is U.S. Pat. No. 4,757,541 issued to R. L. Beadles, concerned with an audio visual speech recognitions system in which an optical scanning or a non-integrated photo signal pick-up is also employed.
Of additional interest with respect to the present invention is U.S. Pat. No. 4,961,177 which discloses a method and apparatus for inputting a voice through a microphone in which a camera system is used to keep the microphone located in an appropriate position with respect to a human speaker.