1. Field of the Invention
The invention relates to methods and apparatus for extracting the information content of audio signals, in particular audio signals associated with human speech.
2. Related Art
Conventional devices for extracting the information content from human speech are plagued with difficulties. Such devices, which include voice activated machines, computers and typewriters, typically seek to recognize, understand and/or respond to spoken language. Speech compressors seek to minimize the number of data bits required to encode digitized speech in order to minimize the cost of transmitting such speech over digital communication links. Hearing-aids seek to augment the hearing impaired's ability to extract information from speech and thus better understand conversations. Numerous other speech interpreting or responsive devices also exist.
As disclosed herein, the difficulties encountered by these devices and their resulting poor performance stem from the fact that they incorporate principles of operation that are wholly unlike the operating principles of the human ear. Since such devices fail to incorporate an information extraction principle similar to that found in the ear, they are incapable of extracting and representing speech information in an efficient manner.
Chappell in "Filter Technique Offers Advantages for Instantaneous Frequency Measurement" published in Microwave System News and Communications Technology, June, 1986, discloses the basic concept of channelized filter discriminators or ratio detectors. Chappell applies the technique to measuring the frequency of individual radar pulses rather than speech and does not address measurement of combination of harmonics for frequency diversity processing. In addition, Chappell uses butterworth filters with a non-linear frequency discriminator curve rather than Gaussian filters, as is disclosed herein, with a perfectly linear discriminator curve or Gaussian/exact log discriminator curve.
Morlet, et al., in "Wavelet Propagation and Sampling Theory" published in Geophysics in 1982, discloses a filter bank with Gaussian filters equally spaced along a logarithmic frequency axis. The system is applied to seismic waves, rather than speech and does not address the measurement and combination of harmonics for frequency diversity processing.
Hartman in "Hearing a Mistuned Harmonic in an Otherwise Periodic Complex Tone", published in 1990 in the Journal of the Acoustical Society of America, and in Chapter 21 of Auditory Function "Pitch Perception and the Segregation and Integration of Auditory Entities" describes the abilities of the auditory system to recognize and distinguish different sounds, but not how this is accomplished. The use a frequency discrimination process to measure harmonic frequencies and "pitch meter" that fits harmonic templates to resolve frequency components using conventional spectral analysis, is also disclosed. However, none of these references can account for observed functional behavior of the human ear. In addition, none of the references discloses that the ear is primarily a modulation detector rather than a general purpose sound detector, speech modulation uses a hybrid AM/FM signaling scheme with frequency diversity via harmonically related carriers. The reasons why ones perception of pitch is logarithmic is that proper FM demodulation of harmonics requires band pass filters with band widths proportional to their center frequencies in a logarithmic relationship. Finally, there is no disclosure of a ratio detector.
Information encoded in signals can be extracted in numerous ways. Usually, the optimal way to extract information from signals is to employ the same approach used for encoding the information. The human ear does not appear to employ conventional data processing methods of extracting information from sound signals, such as methods using Fourier coefficients, Wavelet transform coefficients, linear prediction coefficients or other common techniques dependent on measurements of the sound signals themselves.
Human speech typically contains only about 100 bits of information per second of speech. Yet, when speech is digitized at an 8,000 sample/second rate, the Nyguist limit for telephone (toll) quality speech, with a 12-bit analog-to-digital converter, nearly 100,000 bits of data are obtained each second. Therefore, it should be possible to compress speech data by factors of up to 1000, in order to reduce the number of data bits, and still preserve all of the information. Despite intense research over many decades, the best compression factors achieved for telephone quality speech are only about 20, such as that obtained by the 4800 bits/second code-excited linear prediction (CELP) technique. Worse still, speech compression techniques with high compression factors are extremely complex and require a great deal of computing in order to implement them.
The difficulties encountered in attempting to produce machines to compress or otherwise process speech signals is a direct result of a "which came first, the chicken or the egg" type of problem associated with audio perception. Information from speech cannot be extracted unless it is first known how the information is encoded within speech signals. On the other hand, understanding how the information is encoded is difficult if there is no practical means for recovering it. This situation has not significantly changed in more than one hundred years, since Herman yon Helmholtz tried, and failed, to explain how human hearing functions in terms of "resonators". Since that time, many theories of audio perception have been published, but none of them can account for most of the observed, perceptual behavior of the auditory system. As a direct result of this lack of theoretical understanding, no machines have ever been built that perform in a manner remotely similar to the ear.
Thus, conventional approaches are often inaccurate and inefficient. The invention disclosed herein solves these problems by employing techniques more compatible with the operation of the human auditory system.