This invention relates to the quantitative measurement of the parameters of human speech, especially to those relating to the functioning of the human larynx. The invention is more specifically directed to method and apparatus for processing oral or combined oral and nasal airflow waveflows to provide information concerning laryngeal flow. In voiced speech, vowel sounds are formed by a glottal source waveform on which are superimposed vocal tract resonances or formants. The glottis is the passageway between the vocal folds when they are open. The glottal waveform, which represents the volume velocity of the flow of air through the larynx as the vocal folds open and close in a quasi-periodic manner, normally has in each cycle an "open" portion which represents the outrush of air when the glottis is open, and a relatively flat, "closed" portion which represents the relative cessation of airflow when the vocal folds are closed along all or most of their length. The formants are resonances produced by the vocal tract--that is, the throat, mouth, sinuses, and nasal passages--that selectively amplify certain of the harmonics of the glottal waveform to produce the acoustic qualities associated with the various vowels and consonants of the language.
In very efficient voices, the glottal airflow waveform during a normal voiced speech sound will exhibit a slow rise when the glottis is opening, and then a sharp drop when the glottis closes, with a flat section between the open portions where there is little or no airflow occurring. In most persons, the waveform is much rounder than the ideal, and the glottal waveform more approximates a truncated sine wave shape which would result in a weaker voice for a given expenditure of airflow.
It is also true that in most persons the closed portion of the glottal wave is not absolutely flat, and where there is some problem preventing the vocal folds from closing completely, there may be a significant amount of airflow even during the closed portion. This can sometimes be heard as a breathiness or huskiness of voice. An otherwise unexplained increase in airflow during the closed portion over the level normal for a given speaker could also mean that there is a polyp or other growth in the larynx on or near the vocal folds. Though the details of the glottal air flow pattern can vary greatly between individuals, for the purpose of monitoring the efficiency of the vibration pattern of the vocal folds in modulating the air stream to produce acoustic energy, the parameters of the waveform that are now recognized to have the most significance are (1) the period of the vibratory cycle, which determines the vocal pitch, (2) the average air flow during that period, which determines the rate of deflation of the lungs, (3) the offset of the minimum value of the waveform from zero flow, which indicates the degree to which the vocal fold closure is incomplete, (4) the peak to peak amplitude of the waveform during the cycle, which reflects the mobility of vocal folds as they vibrate, and (5) the percentage of each glottal cycle during which there is a significantly increased air flow, which reflects directly the period during which the vocal folds are separated during their oscillatory cycle and indirectly the degree to which the vocal folds are pressed together (adducted) or held apart (abducted). Ideally, a health practitioner should be able to chart these voice parameters of a patient from week to week to show the progress of disease or of therapy.
The need for such standardized quantitative measures of the functioning of the larynx is one that has long been in search of a solution. At present, there are only two non-invasive methods available for clinical use to estimate or describe the vibratory pattern of the vocal folds, namely electroglottography and inverse-filtering of the airflow or pressure waveform at the mouth.
Electroglottography basically measures the variation and degree of contact between vocal folds during that part of the glottal cycle in which the folds are in contact with each other. Changes in electrical resistance at the throat are measured by a device that contacts the patient's skin and indicates the change in resistance as the vocal folds meet and come into tighter contact. Unfortunately, this technique gives no absolute measure of the degree of contact of the vocal folds and no real information about the portion of the glottal cycle in which the vocal folds are out of contact, beyond the duration of that period.
In an inverse filtering technique either the pressure waveform outside the mouth, as recorded from a suitable microphone, or the waveform of the volume of air flow exiting the mouth or the mouth and nose combined, as recorded by a suitable pneumotachograph, are electronically filtered to produce the waveform of the volume air flow through the glottis. For clinical purposes, inverse-filtering the airflow rather than the pressure is preferred because only the airflow measurement results in a known zero level and permits ready calibration of airflow scale of the resulting glottal flow waveform.
In the air flow inverse-filtering technique, a wire-screen peripheral flow pneumotachograph mask, such as the Rothenberg mask, produces an oral volume velocity waveform adequate for inverse-filtering. An electronic circuit or equivalent computer algorithm permits inverse-filtering of the mask output. By "inverse-filtering", what is meant is that the major formants are removed from the oral waveform by an electronic filter having a frequency response that is the inverse of that of the vocal tract. Because the formant frequencies and bandwidths can change from one patient to another, depending not only on the internal geometry of the patient's vocal tract, but on the precise vowel being spoken, the adjustments of a manual formant inverse-filtering system must be carefully selected by a trained technician. Consequently, voice parameter analysis using this technique has been a rather arduous affair, conducted by only a few highly trained specialists.
Although this technique has been used by many voice research laboratories and some research-oriented facilities, the need to adjust the inverse-filter parameters for each subject, that is, to adjust the frequency and damping of the lowest two or three vocal tract resonances, has made this system impractical for general clinical use.
To overcome this problem, there have been many proposals to develop a computer-based automated inverse-filtering algorithm. Although these may eventually be of value, as presently proposed such automated inverse-filtering schemes can produce gross errors if the program is in error. Unfortunately, such errors are most likely to occur for abnormal voices, and such voices are quite often those of the patients who could benefit most from voice analysis. Such voices typically are breathy or have a significant nasality. Both of these conditions are contrary to the assumptions on which the automated inverse-filtering schemes would be predicated. Sophisticated schemes for automated inverse-filtering which are sufficiently robust to handle an adequately wide variety of voice conditions are not forthcoming.
Thus, to be of value for the clinical evaluation of vocal function, an ideal automatic voice parameter extracting technique should be able, for a wide range of normal and abnormal voice types, to produce a reliable numerical estimate of the duration of each glottal vibratory period measured as well as the value during that period of maximum or peak value, the mean value, and the minimum value of the glottal volume velocity during that period, and an open quotient value which would represent the ratio of the duration of the portion of the glottal cycle during which the air flow is significantly increased from its minimum value to the duration of the entire glottal vibratory period. However, prior to this invention, there have been no suitable techniques developed for automatically extracting these parameters.