The recent development of speech recognition technology has opened up a new era of man-machine interaction. A speech user interface provides a convenient and highly natural method of data entry. However, traditional speech recognizers use complex algorithms which in turn need large storage systems and/or dedicated digital signal processors with high performance computers. Further, due to the computational complexity, these systems generally cannot recognize speech in real-time. Thus, a need exists for an efficient speech recognizer that can operate in real-time and that does not require a dedicated high performance computer.
The advent of powerful single chip computers has made possible compact and inexpensive desktop, notebook, notepad and palmtop computers. These single chip computers can be incorporated into personal items such as watches, rings, necklaces and other forms of jewelry. Because these personal items are accessible at all times, the computerization of these items delivers truly personal computing power to the users. These personal systems are constrained by the battery capacity and storage capacity. Further, due to their miniature size, the computer mounted in the watch or the jewelry cannot house a bulky keyboard for text entry or a writing surface for pen-based data entry. Thus, a need exists for an efficient speaker independent, continuous speech recognizer to act as a user interface for these tiny personal computers.
U.S. Pat. No. 4,717,261, issued to Kita, et al., discloses an electronic wrist watch with a random access memory for recording voice messages from a user. Kita et al. further discloses the use of a voice synthesizer to reproduce keyed-in characters as a voice or speech from the electronic wrist-watch. However, Kita only passively records and plays audio messages, but does not recognize the user's voice and act in response thereto.
U.S. Pat. No. 4,509,133, issued to Monbaron et al. and U.S. Pat. No. 4,573,187, issued to Bui et al. disclose watches which recognize and respond to verbal commands. These patents teach the use of preprogrammed training for the references stored in the vocabulary. When the user first uses the watch, he or she pronounces a word corresponding to a command to the watch to train the recognizer. After training, the user can repeatedly pronounce the trained word to the watch until the watch display shows the correct word on the screen of the watch. U.S. Pat. No. 4,635,286, issued to Bui et al., further discloses a speech-controlled watch having an electro-acoustic means for converting a pronounced word into an analog signal representing that word, a means for transforming the analog signal into a logic control information, and a means for transforming the logic information into a control signal to control the watch display. When a word is pronounced by a user, it is coded and compared with a part of the memorized references. The watch retains the reference whose coding is closest to that of the word pronounced. The digital information corresponding to the reference is converted into a control signal which is applied to the control circuit of the watch. Although wearable speech recognizers are shown in Bui and Monbaron, the devices disclosed therein do not provide for speaker independent speech recognition.
Another problem facing the speech recognizer is the presence of noise, as the user's verbal command and data entry may be made in a noisy environment or in an environment in which multiple speakers are speaking simultaneously. Additionally, the user's voice may fluctuate due to the user's health and mental state. These voice fluctuations severely test the accuracy of traditional speech recognizers. Thus, a need exists for an efficient speech recognizer that can handle medium and large vocabulary robustly in a variety of environments.
Yet another problem facing the portable voice recognizer is the power consumption requirement. Additionally, traditional speech recognizers require the computer to continuously monitor the microphone for verbal activities directed at the computer. However, the continuous monitoring for speech activity even during an extended period of silence wastes a significant amount of battery power. Hence, a need exists for a low-power monitoring of speech activities to wake-up a powered-down computer when commands are being directed to the computer.
Speech recognition is particularly useful as a data entry tool for a personal information management (PIM) system, which tracks telephone numbers, appointments, travel expenses, time entry, note-taking and personal data collection, among others. Although many personal organizers and handheld computers offer PIM capability, these systems are largely under-utilized because of the tedious process of keying in the data using a miniaturized keyboard. Hence, a need exists for an efficient speech recognizer for entering data to a PIM system.