1. Field of Invention
This invention relates to signal processing generally, and more particularly, to the analysis of sound based on models of human audition. Specifically, the invention relates to a method and apparatus for use in high quality speech detection and recognition.
It has been pointed out that to understand the hearing process is to understand the cochlea. Moreover, it is generally recognized that sounds are best characterized in a frequency domain and that the cochlea performs the job of transforming the incoming time-domain pressure signal into this other domain. The exact nature of this frequency domain has not been well clarified and, in fact, has led to some misunderstandings as to the nature of the so-called frequency domain associated with aural perception. Ohm's acoustic law is particularly misleading in that it asserts that the ear is insensitive to phase. Concepts such as smoothed filterbank envelopes, linear predictive coding spectra and the like have never been able to successfully distinguish between complex single sounds and separate unfusible sounds with similar short-term spectra. As a consequence, speech and other sounds have been extremely difficult to reliably decode, and the widespread need for reliable sound and speech recognition systems has gone unfilled.
2. Description of the Prior Art
Typical prior art speech recognition methods and apparatus have been modeled on the assumption that the ear is relatively insensitive to phase, or small values of group delay. Current speech analysis techniques fail to effectively deal with sounds other than pure, simple speech sounds.
Many cochlea models have been suggested in the past. Most are models of only mechanical motion of the basilar membrane to various degrees of fidelity. Some hearing models include a "second filter" of various sorts, transduction nonlinearities and simple compression mechanisms. See, for example, Allen, J. B., "Cochlear Modeling-1980" ICASSP 81, pp. 766-789, Atlanta, 1981; Nilsson, H. G. "A Comparison of Models for Sharpening of Frequency Selectivity in the Cochlea," Biological Cybernetics 28, pp. 177-181, 1978; Schroeder et al., "Model for Mechanical to Neural Transduction of the Auditory Receptor," JASA 55, pp. 1055-1060, 1974; and Kim et al., "A Population Study of Cochlear Nerve Fibers: Comparison of Spatial Distributions of Average-Rate and Phase-Locking Measures of Responses to Single Tones," Journal of Neuro-physiology 42, pp. 16-30, 1979.
Much work has been done in the mechanical modeling of the cochlea, although little has been applied to the speech analysis field. See, for example, Zwislocki, J. J., "Sound Analysis in the Ear: A History of Discoveries," American Scientist, 69, pp. 184-192, 1981; Matthews, J. W., "Mehcanical Modeling of Non-Linear Phenomena Observed in the Peripheral Auditory System," Doctor of Science Thesis, Washington University, St. Louis, Mo. 1980; Neely, S. T., "Fourth-Order Partition Dynamics for a Two-Dimensional Model of the Cochlea," Doctor of Science Thesis, Washington University, St. Louis, Mo. 1981; Zweig et al., "The Cochlear Compromise" JASA 59, pp. 975-982, 1976; Schroeder, M. R., "An Integrable Model for the Basilar Membrane," JASA 53, pp. 429-434, 1973; and Zweig, "Basilar Membrane Motion," Cold Spring Harbor Symposia on Quantitative Biology, Volume XL, pp. 619-633 (Cold Spring Harbor Laboratory, 1976).