Many different types of processing arrangements have been devised to analyze sensory information. With respect to sensory signals derived from sounds such as speech, some processing systems extract specific features such as pitch, formants, or linear predictive parameters to detect, recognize, enhance or synthesize the speech or sounds. Other systems are adapted to form frequency spectra directly from the speech wave. It is generally agreed that the human heating apparatus does not process speech waves in these or similar ways and that human perception of speech for recognition or other purposes is superior to such automatic processing systems.
Little is known about the processing principles in the brain stem, auditory nuclei and the auditory cortex. It is well recognized, however, that sound waves entering the ear cause hair cells in the cochlea to vibrate, and that the sound waves are represented at the cochlear nucleus solely by the auditory nerve firing patterns caused by the hair cells in the cochlea. Such knowledge has been utilized as described for example in U.S. Pat. No. 4,532,930 issued to Peter A. Crosby et al., on Aug. 6, 1985 to provide auditory prosthesis for profoundly deaf persons. It is further known that human understanding of speech in the presence of noise is very good in comparison to automated recognition arrangements whose performance deteriorates rapidly as the noise level increases. Consequently, it has been suggested in the article "Recognition system processes speech the way the ear does" by J. R. Lineback appearing in Electronics, vol. 57, No. 3, Feb. 9, 1984, pp. 45-46 and elsewhere, that speech analysis may be modeled on the auditory nerve firing patterns of the human hearing apparatus.
U.S. Pat. No. 4,536,844 issued to Richard F. Lyon, Aug. 20, 1985, discloses a method and apparatus for simulating aural response information which are based on a model of the human hearing system and the inner ear and wherein the aural response is expressed as signal processing operations that map acoustic signals into neural representations. Accordingly, the human ear is simulated by a high order transfer function modeled as a cascade/parallel filter bank network of simple linear, time invariant filter sections with signal transduction and compression based on half-wave rectification with a nonlinearly coupled variable time constant automatic gain control network. These processing arrangements, however, do not correspond to the nerve firing patterns characteristic of aural response.
U.S. Pat. No. 4,075,423 issued to M. J. Martin et al. on Feb. 21, 1978 disclosed sound analyzing apparatus for extracting basic formant waveforms present in a speech signal, and examining the format waveforms to identify the frequency components thereof using a histogram of the frequency patterns of detected waveform peaks developed over successive sampling periods in a digital processor. The Martin et al arrangement, however, is limited to forming a particular set of acoustic features, i.e., formants but does not address the problem of utilizing the information available in the time differences of level crossings to characterize the acoustic wave more fully than the generation of the few formants there disclosed. In particular, the Martin et al arrangement treats each of the frequency sub-band components of the acoustic wave completely separately. Others have employed techniques somewhat similar to the techniques of the Martin et al patent and have also limited their analysis to formant extraction. See the article by Russell J. Niederjohn et al, "A Zero-Crossing Consistency Method for Format Tracking of Voiced Speech in High Noise Levels", IEEE Transactions on Acoustics, Speech and Signal Processing, vol. ASSP-33, No. 2, Apr. 1985, the article by M. Elghonemy et al, "An Iterative Method for Formant Extraction Using Zero-Crossing Interval Histograms" Melecon '85, vol. II, Digital Signal Processing, A. Luque et al (eds.) Elsevier Science Publishers B. V. (North-Holland) 1985, and the article of one of us, O. Ghitza, "A Measure of In-Synchrony Regions in the Auditory Nerve Firing Patterns as a Basis for Speech Vocoding", International Conference, Acoustics, Speech and Signal Processing, '85, Tampa, Fla., Mar. 26-29, 1985. In the latter article the analysis is advanced, with respect to the different frequency subband components of the acoustic wave, by a nonlinear combination thereof which picks "dominant frequencies" when present in at least 6 adjacent bands and suppresses other distributional information regarding the crossing time differences. We now believe that process causes the loss of valuable information regarding the input bandlimited signal, and that an analysis (a multiplicative nonlinear process) as employed in the article by the other of us, J. B. Allen, "Cochlear Modeling", IEEE ASSP Magazine, January, 1985 has disadvantages in characterizing the input bandlimited signal. It is an object of the invention to provide improved spectral representation of the neural response to sensory patterns that simulates the operation of biological organs and to adapt the technique to processing of bandlimited signals generally.