1. Field of the Invention
The present invention relates to voice recognition techniques and circuits, and more particularly, to a system for more accurate and noise-tolerant robust voice recognition by analyzing redundant features of a source signal.
2. Description of the Related Art
Various signal processing techniques have been developed for analyzing digitized speech signals in order to recognize the underlying content of such speech. Once recognized, this content can then be used to control a handheld telephone, computer, household appliance, or other device. Some such known techniques employ short-time Fourier spectra or “spectrogram” of a speech signal, which are computed using windowed Fourier transforms as explained more fully in Rabiner et al., Fundamentals of Speech Recognition, the entirety of which is incorporated herein by reference.
FIG. 1 shows one known spectral feature extractor 100 for spectral analysis, which includes stages of windowing 102, FFT 104, MEL/BARK filtering 106, Log 108, and RASTA filtering 110. A digitized input speech signal 101 is fed into the windowing stage 102, which divides the input signal into smaller sized segments of appropriate duration, such as 20 milliseconds. The FFT stage 104 performs a Fast Fourier Transform to windowed segments output by the stage 102. The MEL/BARK stage 106 performs warping of the linear frequency scale to a different scale, so that the resolution for lower frequencies is greater than that for higher frequencies. The resolution on the frequency scale becomes progressively coarser from low frequencies to high frequencies in the hearing range. MEL scale and BARK scale are two known transformations that result in the above frequency warping. These two (and some variations) are commonly used in speech recognition. The Log stage 108 takes the Logarithm of the input number, and more particularly, the log of each MEL/BARK transformed spectral value that has been computed. The foregoing stages 102-108 are described in various known publications, with one example being the above-cited text Rabiner et al., Fundamentals of Speech Recognition. 
The RASTA stage 110 serves to filter the output of the Log stage 108 by a predefined bandpass filter. For example, if there are sixteen BARK numbers, there will be sixteen filters operating on each of the bark bands. The RASTA stage 110 may be implemented by any known RASTA processing technique, with one example being described in U.S. Pat. No. 5,450,522 entitled “Auditory Model for Parameterization of Speech” to Hermansky et al., the entirety of which is incorporated herein.
The output of the spectral feature extractor 100 comprises spectral output signals 111, which are thereafter processed by various subsequent techniques (not shown) to yield a “recognition answer” that gives the predicted content of the input speech signal. Recognition answers based on such spectral output signals 111 provide decent accuracy in low noise environments. Advantageously, degradation of their accuracy occurs slowly with decreasing signal-to-noise ratios. Spectral output signals can be further processed in various ways. For instance, one approach further processes the spectral output signals 111 by a cepstral transformation 112 to yield cepstral output signals 114. One type of cepstral transformation 112, for example, utilizes a discrete cosine transform (DCT) followed by a dimensionality reduction. Broadly, “cepstrum” is explained as the inverse Fourier transform of the logarithm of the power spectrum of a signal, as further discussed in the following references, hereby incorporated by reference in their entirety: A. V. Oppenheim and R. W. Schafer, Discrete-Time Signal Processing, J. R. Deller, Jr., J. G. Proakis and J. H. L. Hansen, Discrete-Time Processing of Speech Signals, and L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals. 
In systems where the cepstrum 114 is calculated, the cepstrum (rather than the spectrum 111) is processed by statistical modeling techniques to yield a recognition answer. One benefit of basing recognition answers upon cepstral output signals 114 is that they provide more accurate voice recognition at low levels of noise. However, as noise increases, the error rate increases rapidly for these systems. Therefore, neither spectral nor cepstral voice recognition systems are entirely adequate for applications that could potentially encounter a wide range of noise levels.