The present invention relates generally to speech recognition systems. In particular, the invention is a substantially language independent, voice operated information management device and associated method.
Speech storage and recognition technology has been implemented in a number of devices. Digital voice recorders capture, compress, and store speech, but they are not as useful for retrieving information, partly because the compression technologies used are not well-suited for known speech recognition pattern matching techniques. Users are typically limited to around 45 minutes of speech storage and must use buttons to navigate through sound files.
Speech dictation systems are designed to translate spoken language to text. This is a very challenging task that requires computer systems with large amounts of memory and processing power. Speech dictation systems have difficulty with words that are outside of their built-in vocabulary and difficulty achieving good performance across speaker variation. Also, these systems are language specific. Developing speech-to-text and text-to-speech systems for a specific language is difficult and expensive.
Voice command systems are designed to operate computer programs and databases with voice commands in place of a keyboard and mouse. These systems typically use speaker-independent, language dependent, speech technology. Limited vocabularies and simple grammar syntax give command and control systems a performance edge over open dictation systems.
Using voice to operate computer applications is usually less convenient than using a keyboard and mouse. However, voice command systems have had some success. For example, army mechanics have experimented with voice-enable portable computers that allow hands-free access to maintenance databases.
Voice dialing systems are voice command systems that use a combination of command recognition and user registered sound pattern matching. Such systems allow the user to store a voice pattern with a particular number, then use voice pattern matching to automatically re-dial the number.
Personal digital assistants (PDAs) are popular tools. They have moderate computational power and memory, making them less expensive and more portable than desktop personal computers (PCs), laptop PCs, or even handheld PCs. PDAs allow users to access information, store information, and perform tasks while away from more powerful but less portable devices.
There are several drawbacks with current PDAs. Entering information is tedious. With a pen-based PDA, entering simple things, such as a recipe, can take upwards of 5 minutes.
Entering information is a dedicated task. One cannot conveniently use today""s PDAs while walking, driving, reporting on a live event, pulling items from a shelf, etc. Current PDA require the user""s hands and eyes to operate.
PDAs are still larger than many people would like. A system that is not conveniently worn, like a wristwatch, will often be absent when needed most. Pen and keyboard-based PDAs are as small as they can get and still be usable.
In addition, it is impractical to integrate current speech recognition technologies with a PDA. First, PDA users expect data input to be reliable, but computer dictation systems make frequent mistakes. It is not known how to make effective speech dictation systems when: People speak with wildly different accents and pronunciations, People use out-of-vocabulary words, such as brand names company names, geographic names, etc. New words are constantly being created, so it is impossible for a speech dictation dictionary to be prepared. People insert filler words, misspeak, and create sentences unexpected by computational language models.
Second, using speech to text conversion programs requires the user to visually monitor the system""s text output in order to make corrections and train new words during dictation. This is unacceptable in a portable, hands-free/eyes-free environment.
Third, while PDA users are worldwide, computer dictation systems are language specific. Creating a new dictation system for a language is expensive and time consuming. Many world languages have never had computer dictation systems developed for them. In addition, such dictation systems would not enable bi-lingual users to switch between or mix languages.
Fourth, consumers expect PDAs to be portable and relatively inexpensive, but dictation systems require tremendous computing power and memory. The computational power required to run speech dictation systems is beyond that found is most PDA. Increasing power increases cost and battery weight.
Fifth, consumers expect PDA data output to be reliable, but re-synthesizing speech from dictated text can introduce further errors. If users expect to retrieve data through PDA speech synthesis, it is better not to start from just text information because mispronunciations and incorrect prosody can make text-to-speech systems difficult to understand.
The present invention is a voice operated portable information management system that is substantially language independent and capable of supporting a substantially unlimited vocabulary. One embodiment of the invention includes an input transducer for receiving a user""s speech and an output transducer for outputting sound including speech. A speech processing system is coupled to the input and output transducers and including means for: 1) generating and storing compressed speech data corresponding to a user""s speech received through the input transducer; 2) comparing the stored speech data; 3) re-synthesizing the stored speech data for output as speech through the output transducer; 4) providing an audible user interface including a speech assistant for providing instructions in the user""s language; 5) storing user-specific compressed speech data, including commands, received in response to prompts from the speech assistant for purposes of adapting the system to the user""s speech; 6) identifying memo management commands spoken by the user, and storing and organizing compressed speech data as a function of the identified commands; and 7) identifying memo retrieval commands spoken by the user, and retrieving and outputting the stored speech data as a function of the commands.
Another embodiment of the invention includes one or more switches actuated by a user to indicate memo management commands and/or memo retrieval commands. The speech processing system is coupled to the one or more switches and operates as a function of the actuated switches.
Yet another embodiment of the invention includes a clock and a global positioning sensor (GPS) connected to the speech processing system. The speech processing system identifies temporal commands spoken by the user, stores and organizes compressed speech data and temporal actions, including alarms, as a function of the commands, and responds to the stored temporal actions. The speech processing system also identifies geographic commands spoken by a user, stores and organizes compressed speech data and geographic actions, including alarms, as a function of the commands, and responds to the stored geographic actions.