This invention relates to a system for the processing of signals to extract desired information. The invention is particularly applicable to the extraction of the desired information content from a received speech signal for subsequent use in activating or stimulating an implantable hearing prosthesis or for other purposes.
The variability of speech signals between speakers of the same utterance (as shown in FIG. 1) has been a major problem faced by all speech scientists. However, the fact that human auditory system is capable of extracting relevant speech information from widely varying speech signals has baffled speech researchers for decades. The information must of course be present in the signal but thus far researchers in this field have been unable to devise a system for reliably extracting the information from a speech signal.
The retrieval of text from voice involving recognition of unrestricted speech is still considered to be far beyond the current state of the art. What is being attempted is automatic recognition of words from restricted speech. Even so, the reliability of these ASR (Automatic Speech Recognition) systems is unpredictable. One report ("Selected Military Applications of ASR Technology" by Woodward J. P. & Cupper E. J., IEEE Communications Magazine, 21, 9 December 1983, pg 35-41) lists eighty different factors which can affect their reliability. Such advances in ASR as have been achieved have arisen more from improved electronics and microprocessor chips than from the development of any new technology for ASR.
One type of prior art speech recognition technology, as exemplified by James L. Flanagan, Speech Analysis Synthesis and Perception, Second Edition, Springer-Verlog, New York, 1972, utilizes the identification and tracking of spectral peaks or formants in the speech signal to recognize phonemes or words. First the time domain speech signal is converted into frequency domain spectrums, e.g., by a bank of bandpass filters or by computer Fourier transformation of time segments of digitized readings of the speech signal. The dominant spectral peaks are then determined in each successive spectrum and tracked in the successive spectrums. However, practical prior art systems are limited in the number of words that each system can recognize and the reliability with which phonemes or words can be recognized. Also the prior art systems are generally limited in that each system can reliably recognize words from only one person or a limited number of people.
In considering this question, the present inventors have given consideration to the manner in which the auditory system handles widely varying speech signals and extracts the information required to make the speech signal intelligible. When sounds of speech are transmitted to the higher centers of the brain by means of the auditory system it undergoes several physiological processes.
When speech signals arrive at the middle ear, a mechanical gain control mechanism acts as an automatic gain control function to limit the dynamic range of the signal being analysed. According to the temporal-place representation, the discharge patterns of auditory nerve-fibres show stronger phase locking behaviour to spectral peaks than locking to other harmonics of the stimulus. At physiological sound level, synchrony to dominant spectral peaks saturates and responses to pitch harmonics are suppressed. The resulting effect is such that the rough appearance of the pitch harmonics are masked out.