Speech recognition and voice activation have gained increasing popularity as software-based speech engines have become more powerful and microprocessor speeds have achieved 1 GHz. Originally conceived as tool for taking dictation and affording limited control of the operating system, speech recognition now and in the future will become increasingly pervasive in all types of applications. Speech is the most natural and efficient form of communication. It can be incorporated into all control commands and various applications such as web browsers and search engines. Speech can be used to automate many operations performed manually on a computer. Some modern computers are even designed to be controlled primarily by voice. Two examples of this are U.S. Pat. Nos. 5,305,244 and 5,844,824 which teach a voice activated wearable computer which allows users to operate the computer in a hands-free mode. The disclosure of the '244 patent states, “The computing apparatus includes a voice recognition module, in communication with a processor, for receiving audio commands from the user, for converting the received audio command into electrical signals, for recognizing the converted electrical signals and for sending the recognized signals to the processor for processing, the voice recognition module being supported by the user.” The '824 patent further discloses, “. . . a body-worn, hands-free computer system, which does not rely upon a keyboard input or activation apparatus but rather has various activation means, all of which are hands free.” One of these activation means being speech. Thus, the invention disclosed by these two patents teaches a computer hardware platform which permits control of the operating system and various applications using voice as the primary activation.
Most people can speak about five times faster than they can type and probably ten times faster than they can write. Thus, there are significant gains in efficiency to be achieved from a successful integration of speech recognition and processing into personal computers. The current method of processing speech in the environment of computers is primarily a software based method. The sound card is used as an audio input and contains an analog to digital (A/D) converter that takes the sounds/words picked up by a standard analog microphone and converts them to a digital bit stream to pass on to the microprocessor. Then software, which is stored in memory, is used in tandem with the CPU to process the signal representation of the voice, whether command or just text, and to execute the appropriate command or function. The leading software applications for this kind of interaction are IBM Corporation's ViaVoice® and Dragon Systems Corporation's Naturally Speaking®. These are both speech recognition programs with speech recognition software engines that utilize the CPU of the computer for all processing of speech. This task is very calculation intensive of the CPU and significantly ties up and limits the system resources. In desktop or laptop environments, running off of AC power, this merely causes a degradation of system performance. However, in mobile and wearable environments, where power is usually supplied by batteries, it also causes an excess consumption of power. There is a direct correlation between clock cycles performed by the CPU and power consumption. Additionally, in these mobile/wearable environments, where space is also limited, and little or no active cooling is employed, excess heat generation can cause degradation to memory, motherboard and other silicon-based electronic components and can also cause a degradation in CPU speed to accommodate the heat build up. Thus, an architecture which extends the usable battery life and reduces heat build up by the CPU while efficiently and effectively processing speech would be a significant advancement over the state-of-the-art.
Recently there has been recognition of the use of a digital signal processor (hereinafter DSP) chip for processing of natural speech. An example of this is customer service phone systems whereby callers can speak their input as well as key it in on the keypad integral to their telephones. The DSP is integrated into the phone tree system. When the system receives a signal representative of a spoken word, the DSP performs a matching against known signals representative of known words and effects an input of this data. These systems, however, are generally limited to numeric recognition and are not available in consumer oriented products.
A DSP is essentially a general purpose microprocessor which can be applied to various specific use applications. It includes special logic hardware for executing mathematical functions at speeds, power consumption levels, and efficiencies not usually associated with microprocessors. These chips can be programmed to perform various signal processing functions. There are many commercially available expansion cards for PC's which include DSPs, and generally a software application for programming them, to perform signal processing functions. Because of their hardware and architecture they are generally better suited to performing certain computationally intensive functions.
The design of the DSP is typically optimized specifically for mathematical algorithms such as correlation, convolutions, finite impulse response (FIR) filters, infinite impulse response (IIR) filters, Fast Fourier Transforms (FFT's), matrix computations, and inner products among other operations. Implementations of these mathematical algorithms generally comprise long sequences of systematic arithmetic/multiplicative operations. FFT's and filters are of particular relevance to the processing of speech.
A CPU is generally comprised of an execution unit, cache memory, a memory management unit, and a floating point unit as well as other logic. The task of a general purpose CPU is to execute code and perform operations on data in the computer memory, thus managing the computing platform. In general, the basic X86 or other type computer CPU is designed primarily to perform Boolean/management/data manipulation decision operations. The instructions executed by a general purpose CPU include basic mathematical functions. However, these functions are not well adapted to complex DSP-type mathematical operations. Thus, a general purpose CPU is required to execute a large number of instructions, relative to a DSP, to perform even basic DSP functions.
In the prior art there have been attempts, both in hardware and in software, at incorporating DSPs into the architecture of PCs to take advantage of the efficiencies associated with doing so. U.S. Pat. No. 5,794,068 (hereinafter designated as the '068 patent) teaches one example. In the '068 patent a general purpose CPU is disclosed which contains a general purpose CPU unit such as an X86 core, and also includes a DSP core. The CPU includes a DSP function decoder or preprocessor which examines sequences of instructions and determines if a DSP function is being performed. If the decoder determines that a DSP function is being executed, the function decoder converts the instruction sequences into a DSP macro and routes the macro to the DSP core. The DSP core is able to perform the DSP function in parallel with other operations performed by the general purpose CPU core. The design of this insures that it will be reverse compatible with existing software packages which will require DSP operations to be performed and with those that will not. However, because of the preprocessor, a extra step is introduced to the execution cycle. A disadvantage of the '068 patent is that a command must be decoded to check for DSP instructions prior to processing of the command. An additional disadvantage of the system of the '068 patent is that this architecture is not optimized for the processing of speech and does not teach the inclusion of a command and control speech engine residing in the DSP chip itself. Additionally, the DSP does not serve as the primary interface to all speech input signals originating from the audio input of the computer.
In another example, U.S. Pat. No. 5,915,236 (hereinafter the '236 patent), teaches a software approach to utilizing DSP to process speech. The '236 patent teaches a word recognition system that detects the computational resources available to it, such as speed, number of processors, presence of a DSP, and alters the instructions it executes in response to this detection to optimize the allocation of instructions. The system is primarily a speech recognition program, but the actual word recognition program can vary the computational intensity of its signal processing as a function of available computational resources. If the program detects both a CPU and a DSP processor, it can cause the DSP to determine when the program should interrupt the CPU. The program can also vary the rate at which it filters relatively low scoring words out of consideration during the recognition process as a function of the level of available resources. The disadvantage or problem with this system is that it is a software-based solution that is inherently limited by the architecture of the computer it is running on. That is to say, in the absence of a DSP, the system will accept less robust and accurate performance. Furthermore, the software and CPU are required to check the code for DSP instructions introducing an extra step into the process.
Thus, there exists a need for a speech processing architecture for personal computers, especially mobile, hand-held and wearable computers, which overcomes the above noted deficiencies.