1. Field of the Invention
The present invention relates to (i) the determination of vocal tract acoustic properties by analysis of speech waveforms and to (ii) classification of speech waveforms according to vocal tract transfer functions so as to associate each waveform segment with a given articulatory condition.
2. Description of the Prior Art
It has long been known that vowels are roughly characterized by their patterns of resonances usually called formants. It was variously felt that a vowel could be accurately characterized by one, two, three or more formants depending upon the researcher. In the 1940's, equipment was devised which pictured formants and their movements during the articulation of all types of speech. Formants were observed not only in vowels, but in most or all speech elements or phonemes. Extensive data was published on average formant frequencies and relative intensities of men, women and children covering the first three formants believed adequate by the authors to represent all speech. Pattern matching methods were devised for comparison of formant frequencies and movements against known stored references. These efforts have continued until the present time with limited success, but useful in certain applications.
More sophisticated means of processing formant patterns have made very slow progress toward the promise of speech recognition by machine. In parallel with overall spectral matching methods, related formant tracking techniques were devised, in which formant peaks are tracked by electronic circuitry or computer programs. Formant frequency and in some cases amplitude are converted to voltage or graphic form for further matching and analysis. Also in parallel with these efforts, experiments were conducted into direct waveform matching known as cross-correlation, indirect time waveform matching known as autocorrelation, and in time waveform feature extraction such as voice-unvoice, zerocrossing rate, symmetry, envelope and its slope, to name a few. These methods have produced limited success in restricted applications, but have not met with the expected outstanding successes hoped by experimenters.
More recently, research has turned toward linear predictive coding methods. Popularity of these methods seem to arise from their facility of computer implementation. These lines of research essentially duplicate work that has been done with electrical hardware and analyzed by Fourier and Laplace transformation methods. Finally one line of recent research illustrated by Moshier, U.S. Pat. No. 3,610,831, has used weighted and summed delayed speech signals to achieve in one case a rudimentary inverse filter recognition method.
Research and development in speech has followed lines of analysis and characterization which were used extensively in the development of communication systems such as amplitude modulation, frequency modulation, suppressed carrier, single sideband, and pulse modulation of various types. Speech belongs to a class which may be called cavity modulation and about which little mention has been found in the literature of communication.
The above-mentioned speech recognition methods generally utilize concepts devised for the more conventional communication techniques. As a result, vowels are approximated as periodic waveforms, and speech sounds are characterized only by their power or amplitude spectra. Such techniques do not successfully explain and deal with the multiplicity of different waveforms that can occur from one example to another of the same phoneme due to differences in pitch. An important source of waveform variability may be deemed to be the superposition effect wherein the basic waveforms produced by the vocal tract carry over and overlap thereby causing spectral differences depending upon the pattern of voice source impulses as well as the vocal tract configuration. Different articulatory combinations are not therefore easily classifiable when relying upon the power or amplitude spectra.
There is a need for such classifiers capable of operating reliably on a real-time basis, i.e. of classifying at articulatory rates in apparatuses which could respond reliably and accurately to verbal commands. Such classifiers may also permit transmission of speech over channels which are restricted with respect to bandwidth or time or both wherein articulatory categories would be transmitted and subsequently reconverted to speech at the receiving end. It is believed that the present invention, since it classifies sounds according to vocal tract patterns without regard to pitch, is superior to prior art techniques based on the power spectrum which varies with pitch, therefore will make possible the realization of effective and practical speech recognition systems.