Speech analysis is currently being done by some method of transforming analog signals, representative of pressure variations in the air, into frequency related information. The pressure variations are sound information being transmitted between a speaker and a listener. The frequency information represents the excitation of various cavities in a speaker's vocal tract by pulses from the vocal cords.
Five current methods of speech analysis are described in printed publications as follows:
(1) Fourier transforms: U.S. Pat. No. 4,038,503 to Moshier;
(2) Filtering and zero crossing detection: Sambur and Rabiner, "A Speaker-Independent Digit Recognition System", Automatic Speech and Speaker Recognition, IEEE Press, 1978;
(3) Filtering: U.S. Pat. Nos. 3,646,576 to Thurston, and 3,304,369 to Dreyfus;
(4) Linear prediction: Matchoul, "Linear Prediction: A Tutorial Review", Speech Analysis, IEEE Press, 1979;
(5) Dynamic programming: Sakoe and Chiba, "Dynamic Programming Algorithm Optimization for Spoken Word Recognition", Automatic Speech and Speaker Recognition", IEEE Press, 1979.
(6) Properties of the Speech Signal: G. Fant, "The Acoustics of Speech", Speech Analysis, IEEE Press, 1979.
This prior art approach to speech analysis has the disadvantage of being pitch-dependent, i.e. the components of the frequency transforms are affected by the pitch of the individual speaker's voice. As a result, it is difficult to make a using device, e.g. a computer, respond accurately to many different individuals (particularly adults and children) and to a widely varied vocabulary without considerable adjustment and effort. In fact, no commercially available method is currently capable of recognizing a large vocabulary independently of the speaker.
The prior art does not recognize that the human ear-brain extracts its recognition information for voiced-sounds from the frequency and decay patterns which immediately follow the excitation of the vocal tract by the vocal cord, and essentially ignores the degraded information as the excitation dies away. Consequently, the prior art has failed to recognize that a proper examination of the speech waveform so as to match the human intelligence system must be synchronized with the excitation or glottal pulses.