Devices to analyze speech waveforms have application to assist the deaf and for narrow band width communications. For both applications, each speech utterance, i.e., phoneme, is coded into a different signal, whereby each phoneme has a unique relationship to the coded signal. To assist the deaf, the unique phoneme to signal relationship is utilized to activate an indicator, usually visual, that the deaf can perceive. For narrow band width communication systems the speech signal is transformed into phoneme indicating signals having a band width that is less than approximately 100 bits per second.
Prior art speech analyzers have generally fallen into one of three categories, each of which appears to have certain deficiencies. One of the most commonly employed prior art devices has used detectors for determining when a speech waveform crosses a predetermined amplitude, typically the average, or zero, value of the waveform. Devices of this nature are often referred to as zero crossing detectors since they derive pulse outputs in response to the waveform crossing the zero value. Typically, the number of pulses derived over a predetermined time interval provides an indication of the frequency of the speech waveform. Zero crossing detectors have a tendency to respond only to a frequency component having the highest amplitude, particularly when one frequency component has an amplitude that is much higher than any of the other frequency components. For the first formant (typically 270-730 Hertz), where there is appreciable, important information in frequency components having lower amplitudes than a peak component, this tendency may result in serious loss of information. If two or more frequencies have approximately the same amplitude, the zero crossing detector has a tendency to capture either the highest frequency or the lowest frequency in the waveform, depending upon adjustments made to the zero crossing detector. By responding or capturing the highest or lowest frequency the prior art devices have not been well suited to provide accurate information for speakers having widely differing glottal or fundamental frequencies, as exist between men, woman and children.
Another type of prior art speech analyzer has employed relatively complex apparatus for analyzing the speech spectrum in raw form. Such analyzers typically employ a bank of many parallel bandpass filters responsive to a speech source. Each filter supplies energy in a relatively narrow pass band to an associated amplitude detector and the detectors drive relatively complex processing circuitry. It has been found that such analyzers, in addition to being relatively complex, suffer from the deficiency of providing excessive information. The amount of information derived is often so great that difficulties arise in coding the resultant information into an indication of the uttered phoneme. A further deficiency in spectrum analyzers is that they do not consider phase information of the different components that form a phoneme. Instead, there is derived a d.c. signal indicative of the phoneme amplitude.
The third type of proposed speech analyzer is capable of "learning" the characteristics of different speakers. Such systems, however, must generally be programmed for each individual speaker and are not usually adapted to analyze the speech of a wide variety of speakers whose speech patterns have not been programmed into a memory of the analyzer.